. 1. Implement an environment simulator for this environment, such that the specific geography of the environment is easily altered. Some code for doing this is already in the online code repository.
2. Create an agent that uses policy iteration, and measure its performance in the environment simulator from various starting states. Perform several experiments from each starting state, and compare the average total reward received per run with the utility of the state, as determined by your algorithm.
3. Experiment with increasing the size of the environment. How does the run time for policy iteration vary with the size of the environment?
Consider the $4\times 3$ world shown in
FigureĀ sequential-decision-world-figure
.
1. Implement an environment simulator for this environment, such that
the specific geography of the environment is easily altered. Some
code for doing this is already in the online code repository.
2. Create an agent that uses policy iteration, and measure its
performance in the environment simulator from various
starting states. Perform several experiments from each starting
state, and compare the average total reward received per run with
the utility of the state, as determined by your algorithm.
3. Experiment with increasing the size of the environment. How does the
run time for policy iteration vary with the size of the environment?