Exercise 21.4

The direct utility estimation method in Section passive-rl-section uses distinguished terminal states to indicate the end of a trial. How could it be modified for environments with discounted rewards and no terminal states?

Community Solution

Student Answers

Artificial Intelligence AIMA Exercises