S
- the state type.A
- the action type.public class ModifiedPolicyEvaluation<S,A extends Action> extends java.lang.Object implements PolicyEvaluation<S,A>
Ui+1(s) <- R(s) + γΣs'P(s'|s,πi(s))Ui(s')and this is repeated k times to produce the next utility estimate. The resulting algorithm is called modified policy iteration. It is often much more efficient than standard policy iteration or value iteration.
Constructor and Description |
---|
ModifiedPolicyEvaluation(int k,
double gamma)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
java.util.Map<S,java.lang.Double> |
evaluate(java.util.Map<S,A> pi_i,
java.util.Map<S,java.lang.Double> U,
MarkovDecisionProcess<S,A> mdp)
Policy evaluation: given a policy πi, calculate
Ui=Uπi, the utility of each state if
πi were to be executed.
|
public ModifiedPolicyEvaluation(int k, double gamma)
k
- number iterations to use to produce the next utility estimategamma
- discount γ to be usedpublic java.util.Map<S,java.lang.Double> evaluate(java.util.Map<S,A> pi_i, java.util.Map<S,java.lang.Double> U, MarkovDecisionProcess<S,A> mdp)
PolicyEvaluation
evaluate
in interface PolicyEvaluation<S,A extends Action>
pi_i
- a policy vector indexed by stateU
- a vector of utilities for states in Smdp
- an MDP with states S, actions A(s), transition model P(s'|s,a)