Select a specific member of the set of policies that are optimal for
$R(s)>0$ as shown in
Figure sequential-decision-policies-figure(b), and
calculate the fraction of time the agent spends in each state, in the
limit, if the policy is executed forever. (Hint:
Construct the state-to-state transition probability matrix corresponding
to the policy and see
Exercise markov-convergence-exercise.)
Select a specific member of the set of policies that are optimal for $R(s)>0$ as shown in Figure sequential-decision-policies-figure(b), and calculate the fraction of time the agent spends in each state, in the limit, if the policy is executed forever. (Hint: Construct the state-to-state transition probability matrix corresponding to the policy and see Exercise markov-convergence-exercise.)