Implement an exploring reinforcement learning
agent that uses direct utility estimation. Make two versions—one with a
tabular representation and one using the function approximator in
Equation (4x3-linear-approx-equation). Compare their
performance in three environments:
1. The $4\times 3$ world described in the chapter.
2. A ${10}\times {10}$ world with no obstacles and a +1 reward at (10,10).
3. A ${10}\times {10}$ world with no obstacles and a +1 reward at (5,5).
1. The $4\times 3$ world described in the chapter.
2. A ${10}\times {10}$ world with no obstacles and a +1 reward at (10,10).
3. A ${10}\times {10}$ world with no obstacles and a +1 reward at (5,5).
Implement an exploring reinforcement learning
agent that uses direct utility estimation. Make two versions—one with a
tabular representation and one using the function approximator in
Equation (4x3-linear-approx-equation). Compare their
performance in three environments:
1. The $4\times 3$ world described in the chapter.
2. A ${10}\times {10}$ world with no obstacles and a +1 reward
at (10,10).
3. A ${10}\times {10}$ world with no obstacles and a +1 reward
at (5,5).