ModifiedPolicyEvaluation

Skip navigation links

Prev Class
Next Class

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

java.lang.Object
- aima.core.probability.mdp.impl.ModifiedPolicyEvaluation<S,A>

Type Parameters:

S - the state type.

A - the action type.

All Implemented Interfaces:

PolicyEvaluation<S,A>
```
public class ModifiedPolicyEvaluation<S,A extends Action>
extends java.lang.Object
implements PolicyEvaluation<S,A>
```
Artificial Intelligence A Modern Approach (3rd Edition): page 657.

For small state spaces, policy evaluation using exact solution methods is often the most efficient approach. For large state spaces, O(n³) time might be prohibitive. Fortunately, it is not necessary to do exact policy evaluation. Instead, we can perform some number of simplified value iteration steps (simplified because the policy is fixed) to give a reasonably good approximation of utilities. The simplified Bellman update for this process is:
```
 U_i+1(s) <- R(s) + γΣ_s'P(s'|s,π_i(s))U_i(s')
 
```
and this is repeated k times to produce the next utility estimate. The resulting algorithm is called modified policy iteration. It is often much more efficient than standard policy iteration or value iteration.
Author:

Ciaran O'Reilly, Ravi Mohan

Constructor Summary

Constructors
Constructor and Description

ModifiedPolicyEvaluation(int k, double gamma)
Constructor.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`java.util.Map<S,java.lang.Double>`	`evaluate(java.util.Map<S,A> pi_i, java.util.Map<S,java.lang.Double> U, MarkovDecisionProcess<S,A> mdp)` Policy evaluation: given a policy π_i, calculate U_i=U^π_i, the utility of each state if π_i were to be executed.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - ModifiedPolicyEvaluation
```
public ModifiedPolicyEvaluation(int k,
                                double gamma)
```
    Constructor.
    
    Parameters:
    
    k - number iterations to use to produce the next utility estimate
    
    gamma - discount γ to be used
- Method Detail
  - evaluate
```
public java.util.Map<S,java.lang.Double> evaluate(java.util.Map<S,A> pi_i,
                                                  java.util.Map<S,java.lang.Double> U,
                                                  MarkovDecisionProcess<S,A> mdp)
```
    Description copied from interface: PolicyEvaluation
    
    Policy evaluation: given a policy π_i, calculate U_i=U^π_i, the utility of each state if π_i were to be executed.
    
    Specified by:
    
    evaluate in interface PolicyEvaluation<S,A extends Action>
    
    Parameters:
    
    pi_i - a policy vector indexed by state
    
    U - a vector of utilities for states in S
    
    mdp - an MDP with states S, actions A(s), transition model P(s'|s,a)
    
    Returns:
    
    U_i=U^π_i, the utility of each state if π_i were to be executed.

Skip navigation links

Prev Class
Next Class

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method