S - the state type.A - the action type.public class ValueIteration<S,A extends Action>
extends java.lang.Object
function VALUE-ITERATION(mdp, ε) returns a utility function
inputs: mdp, an MDP with states S, actions A(s), transition model P(s' | s, a),
rewards R(s), discount γ
ε the maximum error allowed in the utility of any state
local variables: U, U', vectors of utilities for states in S, initially zero
δ the maximum change in the utility of any state in an iteration
repeat
U <- U'; δ <- 0
for each state s in S do
U'[s] <- R(s) + γ maxa ∈ A(s) Σs'P(s' | s, a) U[s']
if |U'[s] - U[s]| > δ then δ <- |U'[s] - U[s]|
until δ < ε(1 - γ)/γ
return U
Figure 17.4 The value iteration algorithm for calculating utilities of
states. The termination condition is from Equation (17.8):if ||Ui+1 - Ui|| < ε(1 - γ)/γ then ||Ui+1 - U|| < ε
| Constructor and Description |
|---|
ValueIteration(double gamma)
Constructor.
|
| Modifier and Type | Method and Description |
|---|---|
java.util.Map<S,java.lang.Double> |
valueIteration(MarkovDecisionProcess<S,A> mdp,
double epsilon)
The value iteration algorithm for calculating the utility of states.
|
public ValueIteration(double gamma)
gamma - discount γ to be used.public java.util.Map<S,java.lang.Double> valueIteration(MarkovDecisionProcess<S,A> mdp, double epsilon)
mdp - an MDP with states S, actions A(s), epsilon - the maximum error allowed in the utility of any state