Suppose that we define the utility of a state
sequence to be the maximum reward obtained in any state
in the sequence. Show that this utility function does not result in
stationary preferences between state sequences. Is it still possible to
define a utility function on states such that MEU decision making gives
optimal behavior?
Suppose that we define the utility of a state sequence to be the maximum reward obtained in any state in the sequence. Show that this utility function does not result in stationary preferences between state sequences. Is it still possible to define a utility function on states such that MEU decision making gives optimal behavior?