ValueIteration

java.lang.Object
- aima.core.probability.mdp.search.ValueIteration<S,A>

Type Parameters:: S - the state type.; A - the action type.

public class ValueIteration<S,A extends Action>
extends java.lang.Object

Artificial Intelligence A Modern Approach (3rd Edition): page 653.

 function VALUE-ITERATION(mdp, ε) returns a utility function
   inputs: mdp, an MDP with states S, actions A(s), transition model P(s' | s, a),
             rewards R(s), discount γ
           ε the maximum error allowed in the utility of any state
   local variables: U, U', vectors of utilities for states in S, initially zero
                    δ the maximum change in the utility of any state in an iteration
                    
   repeat
       U <- U'; δ <- 0
       for each state s in S do
           U'[s] <- R(s) + γ  max_{a ∈ A(s)} Σ_s'P(s' | s, a) U[s']
           if |U'[s] - U[s]| > δ then δ <- |U'[s] - U[s]|
   until δ < ε(1 - γ)/γ
   return U

Figure 17.4 The value iteration algorithm for calculating utilities of states. The termination condition is from Equation (17.8):

 if ||U_i+1 - U_i|| < ε(1 - γ)/γ then ||U_i+1 - U|| < ε

Author:: Ciaran O'Reilly, Ravi Mohan

Constructor Summary

Constructors
Constructor and Description

ValueIteration(double gamma)
Constructor.

Constructors
Constructor and Description
`ValueIteration(double gamma)` Constructor.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`java.util.Map<S,java.lang.Double>`	`valueIteration(MarkovDecisionProcess<S,A> mdp, double epsilon)` The value iteration algorithm for calculating the utility of states.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - ValueIteration
```
public ValueIteration(double gamma)
```
    Constructor.
    
    Parameters:
    
    gamma - discount γ to be used.
- Method Detail
  - valueIteration
```
public java.util.Map<S,java.lang.Double> valueIteration(MarkovDecisionProcess<S,A> mdp,
                                                        double epsilon)
```
    The value iteration algorithm for calculating the utility of states.
    
    Parameters:
    
    mdp - an MDP with states S, actions A(s),
    transition model P(s' | s, a), rewards R(s)
    
    epsilon - the maximum error allowed in the utility of any state
    
    Returns:
    
    a vector of utilities for states in S

Class ValueIteration<S,A extends Action>

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

ValueIteration

Method Detail

valueIteration