S
- the state type.A
- the action type.public class ValueIteration<S,A extends Action>
extends java.lang.Object
function VALUE-ITERATION(mdp, ε) returns a utility function inputs: mdp, an MDP with states S, actions A(s), transition model P(s' | s, a), rewards R(s), discount γ ε the maximum error allowed in the utility of any state local variables: U, U', vectors of utilities for states in S, initially zero δ the maximum change in the utility of any state in an iteration repeat U <- U'; δ <- 0 for each state s in S do U'[s] <- R(s) + γ maxa ∈ A(s) Σs'P(s' | s, a) U[s'] if |U'[s] - U[s]| > δ then δ <- |U'[s] - U[s]| until δ < ε(1 - γ)/γ return UFigure 17.4 The value iteration algorithm for calculating utilities of states. The termination condition is from Equation (17.8):
if ||Ui+1 - Ui|| < ε(1 - γ)/γ then ||Ui+1 - U|| < ε
Constructor and Description |
---|
ValueIteration(double gamma)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
java.util.Map<S,java.lang.Double> |
valueIteration(MarkovDecisionProcess<S,A> mdp,
double epsilon)
The value iteration algorithm for calculating the utility of states.
|
public ValueIteration(double gamma)
gamma
- discount γ to be used.public java.util.Map<S,java.lang.Double> valueIteration(MarkovDecisionProcess<S,A> mdp, double epsilon)
mdp
- an MDP with states S, actions A(s), epsilon
- the maximum error allowed in the utility of any state