S - the state type.A - the action type.public class ModifiedPolicyEvaluation<S,A extends Action> extends java.lang.Object implements PolicyEvaluation<S,A>
Ui+1(s) <- R(s) + γΣs'P(s'|s,πi(s))Ui(s')and this is repeated k times to produce the next utility estimate. The resulting algorithm is called modified policy iteration. It is often much more efficient than standard policy iteration or value iteration.
| Constructor and Description |
|---|
ModifiedPolicyEvaluation(int k,
double gamma)
Constructor.
|
| Modifier and Type | Method and Description |
|---|---|
java.util.Map<S,java.lang.Double> |
evaluate(java.util.Map<S,A> pi_i,
java.util.Map<S,java.lang.Double> U,
MarkovDecisionProcess<S,A> mdp)
Policy evaluation: given a policy πi, calculate
Ui=Uπi, the utility of each state if
πi were to be executed.
|
public ModifiedPolicyEvaluation(int k,
double gamma)
k - number iterations to use to produce the next utility estimategamma - discount γ to be usedpublic java.util.Map<S,java.lang.Double> evaluate(java.util.Map<S,A> pi_i, java.util.Map<S,java.lang.Double> U, MarkovDecisionProcess<S,A> mdp)
PolicyEvaluationevaluate in interface PolicyEvaluation<S,A extends Action>pi_i - a policy vector indexed by stateU - a vector of utilities for states in Smdp - an MDP with states S, actions A(s), transition model P(s'|s,a)