Consider the $101 \times 3$ world shown in
FigureĀ grid-mdp-figure(b). In the start state the agent
has a choice of two deterministic actions,

*Up*or*Down*, but in the other states the agent has one deterministic action,*Right*. Assuming a discounted reward function, for what values of the discount $\gamma$ should the agent choose*Up*and for which*Down*? Compute the utility of each action as a function of $\gamma$. (Note that this simple example actually reflects many real-world situations in which one must weigh the value of an immediate action versus the potential continual long-term consequences, such as choosing to dump pollutants into a lake.)*Up* or
*Down*, but in the other states the agent has one
deterministic action, *Right*. Assuming a discounted reward
function, for what values of the discount $\gamma$ should the agent
choose *Up* and for which *Down*? Compute the
utility of each action as a function of $\gamma$. (Note that this simple
example actually reflects many real-world situations in which one must
weigh the value of an immediate action versus the potential continual
long-term consequences, such as choosing to dump pollutants into a
lake.)