Define in your own words: (a) intelligence, (b) artificial intelligence, (c) agent, (d) rationality, (e) logical reasoning.

Read Turing’s original paper on AI Turing:1950 .In the paper, he discusses several objections to his proposed enterprise and his test for intelligence. Which objections still carry weight? Are his refutations valid? Can you think of new objections arising from developments since he wrote the paper? In the paper, he predicts that, by the year 2000, a computer will have a 30% chance of passing a five-minute Turing Test with an unskilled interrogator. What chance do you think a computer would have today? In another 50 years?

Every year the Loebner Prize is awarded to the program that comes closest to passing a version of the Turing Test. Research and report on the latest winner of the Loebner prize. What techniques does it use? How does it advance the state of the art in AI?

Are reflex actions (such as flinching from a hot stove) rational? Are they intelligent?

There are well-known classes of problems that are intractably difficult for computers, and other classes that are provably undecidable. Does this mean that AI is impossible?

Suppose we extend Evans’s *SYSTEM* program so that it can score 200 on a standard
IQ test. Would we then have a program more intelligent than a human?
Explain.

The neural structure of the sea slug *Aplysis* has been
widely studied (first by Nobel Laureate Eric Kandel) because it has only
about 20,000 neurons, most of them large and easily manipulated.
Assuming that the cycle time for an *Aplysis* neuron is
roughly the same as for a human neuron, how does the computational
power, in terms of memory updates per second, compare with the high-end
computer described in (Figure computer-brain-table)?

How could introspection—reporting on one’s inner thoughts—be inaccurate? Could I be wrong about what I’m thinking? Discuss.

To what extent are the following computer systems instances of
artificial intelligence:

- Supermarket bar code scanners.

- Web search engines.

- Voice-activated telephone menus.

- Internet routing algorithms that respond dynamically to the state of
the network.

To what extent are the following computer systems instances of
artificial intelligence:

- Supermarket bar code scanners.

- Voice-activated telephone menus.

- Spelling and grammar correction features in Microsoft Word.

- Internet routing algorithms that respond dynamically to the state of the network.

Many of the computational models of cognitive activities that have been proposed involve quite complex mathematical operations, such as convolving an image with a Gaussian or finding a minimum of the entropy function. Most humans (and certainly all animals) never learn this kind of mathematics at all, almost no one learns it before college, and almost no one can compute the convolution of a function with a Gaussian in their head. What sense does it make to say that the “vision system” is doing this kind of mathematics, whereas the actual person has no idea how to do it?

Some authors have claimed that perception and motor skills are the most important part of intelligence, and that “higher level” capacities are necessarily parasitic—simple add-ons to these underlying facilities. Certainly, most of evolution and a large part of the brain have been devoted to perception and motor skills, whereas AI has found tasks such as game playing and logical inference to be easier, in many ways, than perceiving and acting in the real world. Do you think that AI’s traditional focus on higher-level cognitive abilities is misplaced?

Why would evolution tend to result in systems that act rationally? What goals are such systems designed to achieve?

Is AI a science, or is it engineering? Or neither or both? Explain.

“Surely computers cannot be intelligent—they can do only what their programmers tell them.” Is the latter statement true, and does it imply the former?

“Surely animals cannot be intelligent—they can do only what their genes tell them.” Is the latter statement true, and does it imply the former?

“Surely animals, humans, and computers cannot be intelligent—they can do only what their constituent atoms are told to do by the laws of physics.” Is the latter statement true, and does it imply the former?

Examine the AI literature to discover whether the following tasks can currently be solved by computers: - Playing a decent game of table tennis (Ping-Pong). - Driving in the center of Cairo, Egypt. - Driving in Victorville, California. - Buying a week’s worth of groceries at the market. - Buying a week’s worth of groceries on the Web. - Playing a decent game of bridge at a competitive level. - Discovering and proving new mathematical theorems. - Writing an intentionally funny story. - Giving competent legal advice in a specialized area of law. - Translating spoken English into spoken Swedish in real time. - Performing a complex surgical operation.

For the currently infeasible tasks, try to find out what the difficulties are and predict when, if ever, they will be overcome.

Various subfields of AI have held contests by defining a standard task and inviting researchers to do their best. Examples include the DARPA Grand Challenge for robotic cars, the International Planning Competition, the Robocup robotic soccer league, the TREC information retrieval event, and contests in machine translation and speech recognition. Investigate five of these contests and describe the progress made over the years. To what degree have the contests advanced the state of the art in AI? To what degree do they hurt the field by drawing energy away from new ideas?

Suppose that the performance measure is concerned with just the first $T$ time steps of the environment and ignores everything thereafter. Show that a rational agent’s action may depend not just on the state of the environment but also on the time step it has reached.

Let us examine the rationality of various
vacuum-cleaner agent functions.

1. Show that the simple vacuum-cleaner agent function described in
Figure vacuum-agent-function-table is indeed
rational under the assumptions listed on page vacuum-rationality-page

2. Describe a rational agent function for the case in which each
movement costs one point. Does the corresponding agent program
require internal state?

3. Discuss possible agent designs for the cases in which clean squares
can become dirty and the geography of the environment is unknown.
Does it make sense for the agent to learn from its experience in
these cases? If so, what should it learn? If not, why not?

Write an essay on the relationship between evolution and one or more of autonomy, intelligence, and learning.

For each of the following assertions, say whether it is true or false
and support your answer with examples or counterexamples where
appropriate.

1. An agent that senses only partial information about the state cannot
be perfectly rational.

2. There exist task environments in which no pure reflex agent can
behave rationally.

3. There exists a task environment in which every agent is rational.

4. The input to an agent program is the same as the input to the
agent function.

5. Every agent function is implementable by some
program/machine combination.

6. Suppose an agent selects its action uniformly at random from the set
of possible actions. There exists a deterministic task environment
in which this agent is rational.

7. It is possible for a given agent to be perfectly rational in two
distinct task environments.

8. Every agent is rational in an unobservable environment.

9. A perfectly rational poker-playing agent never loses.

For each of the following activities, give a PEAS
description of the task environment and characterize it in terms of the
properties listed in Section env-properties-subsection

- Playing soccer.

- Exploring the subsurface oceans of Titan.

- Shopping for used AI books on the Internet.

- Playing a tennis match.

- Practicing tennis against a wall.

- Performing a high jump.

- Knitting a sweater.

- Bidding on an item at an auction.

For each of the following activities, give a PEAS
description of the task environment and characterize it in terms of the
properties listed in Section env-properties-subsection

- Performing a gymnastics floor routine.

- Exploring the subsurface oceans of Titan.

- Playing soccer.

- Shopping for used AI books on the Internet.

- Practicing tennis against a wall.

- Performing a high jump.

- Bidding on an item at an auction.

Define in your own words the following terms: agent, agent function, agent program, rationality, autonomy, reflex agent, model-based agent, goal-based agent, utility-based agent, learning agent.

This exercise explores the differences between
agent functions and agent programs.

1. Can there be more than one agent program that implements a given
agent function? Give an example, or show why one is not possible.

2. Are there agent functions that cannot be implemented by any agent
program?

3. Given a fixed machine architecture, does each agent program
implement exactly one agent function?

4. Given an architecture with $n$ bits of storage, how many different
possible agent programs are there?

5. Suppose we keep the agent program fixed but speed up the machine by
a factor of two. Does that change the agent function?

Write pseudocode agent programs for the goal-based and utility-based agents.

Consider a simple thermostat that turns on a furnace when the temperature is at least 3 degrees below the setting, and turns off a furnace when the temperature is at least 3 degrees above the setting. Is a thermostat an instance of a simple reflex agent, a model-based reflex agent, or a goal-based agent?

Implement a performance-measuring environment simulator for the vacuum-cleaner world depicted in Figure vacuum-world-figure and specified on page vacuum-rationality-page. Your implementation should be modular so that the sensors, actuators, and environment characteristics (size, shape, dirt placement, etc.) can be changed easily. (Note: for some choices of programming language and operating system there are already implementations in the online code repository.)

Implement a simple reflex agent for the vacuum environment in Exercise vacuum-start-exercise. Run the environment with this agent for all possible initial dirt configurations and agent locations. Record the performance score for each configuration and the overall average score.

Consider a modified version of the
vacuum environment in Exercise vacuum-start-exercise,
in which the agent is penalized one point for each movement.

1. Can a simple reflex agent be perfectly rational for this
environment? Explain.

2. What about a reflex agent with state? Design such an agent.

3. How do your answers to 1 and 2
change if the agent’s percepts give it the clean/dirty status of
every square in the environment?

Consider a modified version of the
vacuum environment in Exercise vacuum-start-exercise,
in which the geography of the environment—its extent, boundaries, and
obstacles—is unknown, as is the initial dirt configuration. (The agent
can go Up and Down as well as Left and Right.)

1. Can a simple reflex agent be perfectly rational for this
environment? Explain.

2. Can a simple reflex agent with a randomized agent
function outperform a simple reflex agent? Design such an agent and
measure its performance on several environments.

3. Can you design an environment in which your randomized agent will
perform poorly? Show your results.

4. Can a reflex agent with state outperform a simple reflex agent?
Design such an agent and measure its performance on several
environments. Can you design a rational agent of this type?

Repeat Exercise vacuum-unknown-geog-exercise for the case in which the location sensor is replaced with a “bump” sensor that detects the agent’s attempts to move into an obstacle or to cross the boundaries of the environment. Suppose the bump sensor stops working; how should the agent behave?

Explain why problem formulation must follow goal formulation.

Give a complete problem formulation for each of the following problems.
Choose a formulation that is precise enough to be implemented.

1. There are six glass boxes in a row, each with a lock. Each of the
first five boxes holds a key unlocking the next box in line; the
last box holds a banana. You have the key to the first box, and you
want the banana.

2. You start with the sequence ABABAECCEC, or in general any sequence
made from A, B, C, and E. You can transform this sequence using the
following equalities: AC = E, AB = BC, BB = E, and E$x$ = $x$ for
any $x$. For example, ABBC can be transformed into AEC, and then AC,
and then E. Your goal is to produce the sequence E.

3. There is an $n \times n$ grid of squares, each square initially
being either unpainted floor or a bottomless pit. You start standing
on an unpainted floor square, and can either paint the square under
you or move onto an adjacent unpainted floor square. You want the
whole floor painted.

4. A container ship is in port, loaded high with containers. There 13
rows of containers, each 13 containers wide and 5 containers tall.
You control a crane that can move to any location above the ship,
pick up the container under it, and move it onto the dock. You want
the ship unloaded.

Your goal is to navigate a robot out of a maze. The robot starts in the
center of the maze facing north. You can turn the robot to face north,
east, south, or west. You can direct the robot to move forward a certain
distance, although it will stop before hitting a wall.

1. Formulate this problem. How large is the state space?

2. In navigating a maze, the only place we need to turn is at the
intersection of two or more corridors. Reformulate this problem
using this observation. How large is the state space now?

3. From each point in the maze, we can move in any of the four
directions until we reach a turning point, and this is the only
action we need to do. Reformulate the problem using these actions.
Do we need to keep track of the robot’s orientation now?

4. In our initial description of the problem we already abstracted from
the real world, restricting actions and removing details. List three
such simplifications we made.

You have a $9 \times 9$ grid of squares, each of which can be colored
red or blue. The grid is initially colored all blue, but you can change
the color of any square any number of times. Imagining the grid divided
into nine $3 \times 3$ sub-squares, you want each sub-square to be all
one color but neighboring sub-squares to be different colors.

1. Formulate this problem in the straightforward way. Compute the size
of the state space.

2. You need color a square only once. Reformulate, and compute the size
of the state space. Would breadth-first graph search perform faster
on this problem than on the one in (a)? How about iterative
deepening tree search?

3. Given the goal, we need consider only colorings where each
sub-square is uniformly colored. Reformulate the problem and compute
the size of the state space.

4. How many solutions does this problem have?

5. Parts (b) and (c) successively abstracted the original problem (a).
Can you give a translation from solutions in problem (c) into
solutions in problem (b), and from solutions in problem (b) into
solutions for problem (a)?

Suppose two friends live in different cities on
a map, such as the Romania map shown in . On every turn, we can
simultaneously move each friend to a neighboring city on the map. The
amount of time needed to move from city $i$ to neighbor $j$ is equal to
the road distance $d(i,j)$ between the cities, but on each turn the
friend that arrives first must wait until the other one arrives (and
calls the first on his/her cell phone) before the next turn can begin.
We want the two friends to meet as quickly as possible.

1. Write a detailed formulation for this search problem. (You will find
it helpful to define some formal notation here.)

2. Let $D(i,j)$ be the straight-line distance between cities $i$ and
$j$. Which of the following heuristic functions are admissible? (i)
$D(i,j)$; (ii) $2\cdot D(i,j)$; (iii) $D(i,j)/2$.

3. Are there completely connected maps for which no solution exists?

4. Are there maps in which all solutions require one friend to visit
the same city twice?

Show that the 8-puzzle states are divided
into two disjoint sets, such that any state is reachable from any other
state in the same set, while no state is reachable from any state in the
other set. (*Hint:* See Berlekamp+al:1982) Devise a procedure to decide
which set a given state is in, and explain why this is useful for
generating random states.

Consider the $n$-queens problem using the
“efficient” incremental formulation given on page nqueens-page. Explain why the state
space has at least $\sqrt[3]{n!}$ states and estimate the largest $n$
for which exhaustive exploration is feasible. (*Hint*:
Derive a lower bound on the branching factor by considering the maximum
number of squares that a queen can attack in any column.)

Give a complete problem formulation for each of the following. Choose a
formulation that is precise enough to be implemented.

1. Using only four colors, you have to color a planar map in such a way
that no two adjacent regions have the same color.

2. A 3-foot-tall monkey is in a room where some bananas are suspended
from the 8-foot ceiling. He would like to get the bananas. The room
contains two stackable, movable, climbable 3-foot-high crates.

3. You have a program that outputs the message “illegal input record”
when fed a certain file of input records. You know that processing
of each record is independent of the other records. You want to
discover what record is illegal.

4. You have three jugs, measuring 12 gallons, 8 gallons, and 3 gallons,
and a water faucet. You can fill the jugs up or empty them out from
one to another or onto the ground. You need to measure out exactly
one gallon.

Consider the problem of finding the shortest
path between two points on a plane that has convex polygonal obstacles
as shown in . This is an idealization of the problem that a robot has to
solve to navigate in a crowded environment.

1. Suppose the state space consists of all positions $(x,y)$ in
the plane. How many states are there? How many paths are there to
the goal?

2. Explain briefly why the shortest path from one polygon vertex to any
other in the scene must consist of straight-line segments joining
some of the vertices of the polygons. Define a good state space now.
How large is this state space?

3. Define the necessary functions to implement the search problem,
including an function that takes a vertex as input and returns a set
of vectors, each of which maps the current vertex to one of the
vertices that can be reached in a straight line. (Do not forget the
neighbors on the same polygon.) Use the straight-line distance for
the heuristic function.

4. Apply one or more of the algorithms in this chapter to solve a range
of problems in the domain, and comment on their performance.

On page non-negative-g, we said that we would not consider problems
with negative path costs. In this exercise, we explore this decision in
more depth.

1. Suppose that actions can have arbitrarily large negative costs;
explain why this possibility would force any optimal algorithm to
explore the entire state space.

2. Does it help if we insist that step costs must be greater than or
equal to some negative constant $c$? Consider both trees and graphs.

3. Suppose that a set of actions forms a loop in the state space such
that executing the set in some order results in no net change to
the state. If all of these actions have negative cost, what does
this imply about the optimal behavior for an agent in such an
environment?

4. One can easily imagine actions with high negative cost, even in
domains such as route finding. For example, some stretches of road
might have such beautiful scenery as to far outweigh the normal
costs in terms of time and fuel. Explain, in precise terms, within
the context of state-space search, why humans do not drive around
scenic loops indefinitely, and explain how to define the state space
and actions for route finding so that artificial agents can also
avoid looping.

5. Can you think of a real domain in which step costs are such as to
cause looping?

The problem is usually stated as follows. Three
missionaries and three cannibals are on one side of a river, along with
a boat that can hold one or two people. Find a way to get everyone to
the other side without ever leaving a group of missionaries in one place
outnumbered by the cannibals in that place. This problem is famous in AI
because it was the subject of the first paper that approached problem
formulation from an analytical viewpoint Amarel:1968.

1. Formulate the problem precisely, making only those distinctions
necessary to ensure a valid solution. Draw a diagram of the complete
state space.

2. Implement and solve the problem optimally using an appropriate
search algorithm. Is it a good idea to check for repeated states?

3. Why do you think people have a hard time solving this puzzle, given
that the state space is so simple?

Define in your own words the following terms: state, state space, search tree, search node, goal, action, transition model, and branching factor.

What’s the difference between a world state, a state description, and a search node? Why is this distinction useful?

An action such as really consists of a long sequence of finer-grained actions: turn on the car, release the brake, accelerate forward, etc. Having composite actions of this kind reduces the number of steps in a solution sequence, thereby reducing the search time. Suppose we take this to the logical extreme, by making super-composite actions out of every possible sequence of actions. Then every problem instance is solved by a single super-composite action, such as . Explain how search would work in this formulation. Is this a practical approach for speeding up problem solving?

Does a finite state space always lead to a finite search tree? How about a finite state space that is a tree? Can you be more precise about what types of state spaces always lead to finite search trees? (Adapted from , 1996.)

Prove that satisfies the graph
separation property illustrated in . (*Hint*: Begin by
showing that the property holds at the start, then show that if it holds
before an iteration of the algorithm, it holds afterwards.) Describe a
search algorithm that violates the property.

Which of the following are true and which are false? Explain your
answers.

1. Depth-first search always expands at least as many nodes as A search
with an admissible heuristic.

2. $h(n)=0$ is an admissible heuristic for the 8-puzzle.

3. A is of no use in robotics because percepts, states, and actions
are continuous.

4. Breadth-first search is complete even if zero step costs
are allowed.

5. Assume that a rook can move on a chessboard any number of squares in
a straight line, vertically or horizontally, but cannot jump over
other pieces. Manhattan distance is an admissible heuristic for the
problem of moving the rook from square A to square B in the smallest
number of moves.

Consider a state space where the start state is number 1 and each state
$k$ has two successors: numbers $2k$ and $2k+1$.

1. Draw the portion of the state space for states 1 to 15.

2. Suppose the goal state is 11. List the order in which nodes will be
visited for breadth-first search, depth-limited search with limit 3,
and iterative deepening search.

3. How well would bidirectional search work on this problem? What is
the branching factor in each direction of the bidirectional search?

4. Does the answer to (c) suggest a reformulation of the problem that
would allow you to solve the problem of getting from state 1 to a
given goal state with almost no search?

5. Call the action going from $k$ to $2k$ Left, and the action going to
$2k+1$ Right. Can you find an algorithm that outputs the solution to
this problem without any search at all?

A basic wooden railway set contains the pieces shown in
. The task is to connect these pieces into a railway that has no
overlapping tracks and no loose ends where a train could run off onto
the floor.

1. Suppose that the pieces fit together *exactly* with no
slack. Give a precise formulation of the task as a search problem.

2. Identify a suitable uninformed search algorithm for this task and
explain your choice.

3. Explain why removing any one of the “fork” pieces makes the
problem unsolvable.

4. Give an upper bound on the total size of the state space defined by
your formulation. (*Hint*: think about the maximum
branching factor for the construction process and the maximum depth,
ignoring the problem of overlapping pieces and loose ends. Begin by
pretending that every piece is unique.)

Implement two versions of the function for the 8-puzzle: one that copies and edits the data structure for the parent node $s$ and one that modifies the parent state directly (undoing the modifications as needed). Write versions of iterative deepening depth-first search that use these functions and compare their performance.

On page iterative-lengthening-page,
we mentioned **iterative lengthening search**,
an iterative analog of uniform cost search. The idea is to use increasing limits on
path cost. If a node is generated whose path cost exceeds the current
limit, it is immediately discarded. For each new iteration, the limit is
set to the lowest path cost of any node discarded in the previous
iteration.

1. Show that this algorithm is optimal for general path costs.

2. Consider a uniform tree with branching factor $b$, solution depth
$d$, and unit step costs. How many iterations will iterative
lengthening require?

3. Now consider step costs drawn from the continuous range
$[\epsilon,1]$, where $0 < \epsilon < 1$. How many iterations are
required in the worst case?

4. Implement the algorithm and apply it to instances of the 8-puzzle
and traveling salesperson problems. Compare the algorithm’s
performance to that of uniform-cost search, and comment on
your results.

Describe a state space in which iterative deepening search performs much worse than depth-first search (for example, $O(n^{2})$ vs. $O(n)$).

Write a program that will take as input two Web page URLs and find a path of links from one to the other. What is an appropriate search strategy? Is bidirectional search a good idea? Could a search engine be used to implement a predecessor function?

Consider the vacuum-world problem defined in .

1. Which of the algorithms defined in this chapter would be appropriate
for this problem? Should the algorithm use tree search or graph
search?

2. Apply your chosen algorithm to compute an optimal sequence of
actions for a $3\times 3$ world whose initial state has dirt in the
three top squares and the agent in the center.

3. Construct a search agent for the vacuum world, and evaluate its
performance in a set of $3\times 3$ worlds with probability 0.2 of
dirt in each square. Include the search cost as well as path cost in
the performance measure, using a reasonable exchange rate.

4. Compare your best search agent with a simple randomized reflex agent
that sucks if there is dirt and otherwise moves randomly.

5. Consider what would happen if the world were enlarged to
$n \times n$. How does the performance of the search agent and of
the reflex agent vary with $n$?

Prove each of the following statements,
or give a counterexample:

1. Breadth-first search is a special case of uniform-cost search.

2. Depth-first search is a special case of best-first tree search.

3. Uniform-cost search is a special case of A search.

Compare the performance of A and RBFS on a set of randomly generated problems in the 8-puzzle (with Manhattan distance) and TSP (with MST—see ) domains. Discuss your results. What happens to the performance of RBFS when a small random number is added to the heuristic values in the 8-puzzle domain?

Trace the operation of A search applied to the problem of getting to Bucharest from Lugoj using the straight-line distance heuristic. That is, show the sequence of nodes that the algorithm will consider and the $f$, $g$, and $h$ score for each node.

Sometimes there is no good evaluation function for a problem but there is a good comparison method: a way to tell whether one node is better than another without assigning numerical values to either. Show that this is enough to do a best-first search. Is there an analog of A for this setting?

Devise a state space in which A using returns a suboptimal solution with an $h(n)$ function that is admissible but inconsistent.

Accurate heuristics don’t necessarily reduce search time in the worst case. Given any depth $d$, define a search problem with a goal node at depth $d$, and write a heuristic function such that $|h(n) - h^\*(n)| \le O(\log h^\*(n))$ but $A^*$ expands all nodes of depth less than $d$.

The **heuristic path algorithm** Pohl:1977 is a best-first search in which the evaluation function
is $f(n) =
(2-w)g(n) + wh(n)$. For what values of $w$ is this complete? For what
values is it optimal, assuming that $h$ is admissible? What kind of
search does this perform for $w=0$, $w=1$, and $w=2$?

Consider the unbounded version of the regular 2D grid shown in . The
start state is at the origin, (0,0), and the goal state is at $(x,y)$.

1. What is the branching factor $b$ in this state space?

2. How many distinct states are there at depth $k$ (for $k>0$)?

3. What is the maximum number of nodes expanded by breadth-first tree
search?

4. What is the maximum number of nodes expanded by breadth-first graph
search?

5. Is $h = |u-x| + |v-y|$ an admissible heuristic for a state at
$(u,v)$? Explain.

6. How many nodes are expanded by A graph search using $h$?

7. Does $h$ remain admissible if some links are removed?

8. Does $h$ remain admissible if some links are added between
nonadjacent states?

$n$ vehicles occupy squares $(1,1)$ through $(n,1)$ (i.e., the bottom
row) of an $n\times n$ grid. The vehicles must be moved to the top row
but in reverse order; so the vehicle $i$ that starts in $(i,1)$ must end
up in $(n-i+1,n)$. On each time step, every one of the $n$ vehicles can
move one square up, down, left, or right, or stay put; but if a vehicle
stays put, one other adjacent vehicle (but not more than one) can hop
over it. Two vehicles cannot occupy the same square.

1. Calculate the size of the state space as a function of $n$.

2. Calculate the branching factor as a function of $n$.

3. Suppose that vehicle $i$ is at $(x_i,y_i)$; write a nontrivial
admissible heuristic $h_i$ for the number of moves it will require
to get to its goal location $(n-i+1,n)$, assuming no other vehicles
are on the grid.

4. Which of the following heuristics are admissible for the problem of
moving all $n$ vehicles to their destinations? Explain.

1. $\sum_{i= 1}^{n} h_i$.

2. $\max\{h_1,\ldots,h_n\}$.

3. $\min\{h_1,\ldots,h_n\}$.

Consider the problem of moving $k$ knights from $k$ starting squares
$s_1,\ldots,s_k$ to $k$ goal squares $g_1,\ldots,g_k$, on an unbounded
chessboard, subject to the rule that no two knights can land on the same
square at the same time. Each action consists of moving *up
to* $k$ knights simultaneously. We would like to complete the
maneuver in the smallest number of actions.

1. What is the maximum branching factor in this state space, expressed
as a function of $k$?

2. Suppose $h_i$ is an admissible heuristic for the problem of moving
knight $i$ to goal $g_i$ by itself. Which of the following
heuristics are admissible for the $k$-knight problem? Of those,
which is the best?

1. $\min\{h_1,\ldots,h_k\}$.

2. $\max\{h_1,\ldots,h_k\}$.

3. $\sum_{i= 1}^{k} h_i$.

3. Repeat (b) for the case where you are allowed to move only one
knight at a time.

We saw on page I-to-F that the straight-line distance heuristic leads greedy best-first search astray on the problem of going from Iasi to Fagaras. However, the heuristic is perfect on the opposite problem: going from Fagaras to Iasi. Are there problems for which the heuristic is misleading in both directions?

Invent a heuristic function for the 8-puzzle that sometimes overestimates, and show how it can lead to a suboptimal solution on a particular problem. (You can use a computer to help if you want.) Prove that if $h$ never overestimates by more than $c$, A using $h$ returns a solution whose cost exceeds that of the optimal solution by no more than $c$.

Prove that if a heuristic is consistent, it must be admissible. Construct an admissible heuristic that is not consistent.

The traveling salesperson problem (TSP) can be
solved with the minimum-spanning-tree (MST) heuristic, which estimates
the cost of completing a tour, given that a partial tour has already
been constructed. The MST cost of a set of cities is the smallest sum of
the link costs of any tree that connects all the cities.

1. Show how this heuristic can be derived from a relaxed version of
the TSP.

2. Show that the MST heuristic dominates straight-line distance.

3. Write a problem generator for instances of the TSP where cities are
represented by random points in the unit square.

4. Find an efficient algorithm in the literature for constructing the
MST, and use it with A graph search to solve instances of the TSP.

On page Gaschnig-h-page , we defined the relaxation of the 8-puzzle in
which a tile can move from square A to square B if B is blank. The exact
solution of this problem defines **Gaschnig's heuristic** Gaschnig:1979. Explain why Gaschnig’s
heuristic is at least as accurate as $h_1$ (misplaced tiles), and show
cases where it is more accurate than both $h_1$ and $h_2$ (Manhattan
distance). Explain how to calculate Gaschnig’s heuristic efficiently.

We gave two simple heuristics for the 8-puzzle: Manhattan distance and misplaced tiles. Several heuristics in the literature purport to improve on this—see, for example, Nilsson:1971, Mostow+Prieditis:1989, and Hansson+al:1992. Test these claims by implementing the heuristics and comparing the performance of the resulting algorithms.

Give the name of the algorithm that results from each of the following
special cases:

1. Local beam search with $k = 1$.

2. Local beam search with one initial state and no limit on the number
of states retained.

3. Simulated annealing with $T = 0$ at all times (and omitting the
termination test).

4. Simulated annealing with $T=\infty$ at all times.

5. Genetic algorithm with population size $N = 1$.

Exercise brio-exercise considers the problem of building railway tracks under the assumption that pieces fit exactly with no slack. Now consider the real problem, in which pieces don’t fit exactly but allow for up to 10 degrees of rotation to either side of the “proper” alignment. Explain how to formulate the problem so it could be solved by simulated annealing.

In this exercise, we explore the use of local search methods to solve
TSPs of the type defined in Exercise tsp-mst-exercise

1. Implement and test a hill-climbing method to solve TSPs. Compare the
results with optimal solutions obtained from the A* algorithm with
the MST heuristic (Exercise tsp-mst-exercise)

2. Repeat part (a) using a genetic algorithm instead of hill climbing.
You may want to consult @Larranaga+al:1999 for some suggestions for representations.

Generate a large number of 8-puzzle and 8-queens instances and solve them (where possible) by hill climbing (steepest-ascent and first-choice variants), hill climbing with random restart, and simulated annealing. Measure the search cost and percentage of solved problems and graph these against the optimal solution cost. Comment on your results.

The **And-Or-Graph-Search** algorithm in
Figure and-or-graph-search-algorithm checks for
repeated states only on the path from the root to the current state.
Suppose that, in addition, the algorithm were to store
*every* visited state and check against that list. (See in
Figure breadth-first-search-algorithm for an example.)
Determine the information that should be stored and how the algorithm
should use that information when a repeated state is found.
(*Hint*: You will need to distinguish at least between
states for which a successful subplan was constructed previously and
states for which no subplan could be found.) Explain how to use labels,
as defined in Section cyclic-plan-section, to avoid
having multiple copies of subplans.

Explain precisely how to modify the **And-Or-Graph-Search** algorithm to
generate a cyclic plan if no acyclic plan exists. You will need to deal
with three issues: labeling the plan steps so that a cyclic plan can
point back to an earlier part of the plan, modifying **Or-Search** so that it
continues to look for acyclic plans after finding a cyclic plan, and
augmenting the plan representation to indicate whether a plan is cyclic.
Show how your algorithm works on (a) the slippery vacuum world, and (b)
the slippery, erratic vacuum world. You might wish to use a computer
implementation to check your results.

In Section conformant-section we introduced belief states to solve sensorless search problems. A sequence of actions solves a sensorless problem if it maps every physical state in the initial belief state $b$ to a goal state. Suppose the agent knows $h^\*(s)$, the true optimal cost of solving the physical state $s$ in the fully observable problem, for every state $s$ in $b$. Find an admissible heuristic $h(b)$ for the sensorless problem in terms of these costs, and prove its admissibilty. Comment on the accuracy of this heuristic on the sensorless vacuum problem of Figure vacuum2-sets-figure. How well does A* perform?

This exercise explores
subset–superset relations between belief states in sensorless or
partially observable environments.

1. Prove that if an action sequence is a solution for a belief state
$b$, it is also a solution for any subset of $b$. Can anything be
said about supersets of $b$?

2. Explain in detail how to modify graph search for sensorless problems
to take advantage of your answers in (a).

3. Explain in detail how to modify and–or search for
partially observable problems, beyond the modifications you describe
in (b).

On page multivalued-sensorless-page it was assumed
that a given action would have the same cost when executed in any
physical state within a given belief state. (This leads to a
belief-state search problem with well-defined step costs.) Now consider
what happens when the assumption does not hold. Does the notion of
optimality still make sense in this context, or does it require
modification? Consider also various possible definitions of the “cost”
of executing an action in a belief state; for example, we could use the
*minimum* of the physical costs; or the
*maximum*; or a cost *interval* with the lower
bound being the minimum cost and the upper bound being the maximum; or
just keep the set of all possible costs for that action. For each of
these, explore whether A* (with modifications if necessary) can return
optimal solutions.

Consider the sensorless version of the erratic vacuum world. Draw the belief-state space reachable from the initial belief state $\{1,2,3,4,5,6,7,8\}$, and explain why the problem is unsolvable.

Consider the sensorless version of the erratic vacuum world. Draw the belief-state space reachable from the initial belief state $\{ 1,3,5,7 \}$, and explain why the problem is unsolvable.

We can turn the navigation problem in
Exercise path-planning-exercise into an environment as
follows:

- The percept will be a list of the positions, *relative to the
agent*, of the visible vertices. The percept does
*not* include the position of the robot! The robot must
learn its own position from the map; for now, you can assume that
each location has a different “view.”

- Each action will be a vector describing a straight-line path
to follow. If the path is unobstructed, the action succeeds;
otherwise, the robot stops at the point where its path first
intersects an obstacle. If the agent returns a zero motion vector
and is at the goal (which is fixed and known), then the environment
teleports the agent to a *random location* (not inside
an obstacle).

- The performance measure charges the agent 1 point for each unit of
distance traversed and awards 1000 points each time the goal
is reached.

1. Implement this environment and a problem-solving agent for it. After
each teleportation, the agent will need to formulate a new problem,
which will involve discovering its current location.

2. Document your agent’s performance (by having the agent generate
suitable commentary as it moves around) and report its performance
over 100 episodes.

3. Modify the environment so that 30% of the time the agent ends up at
an unintended destination (chosen randomly from the other visible
vertices if any; otherwise, no move at all). This is a crude model
of the motion errors of a real robot. Modify the agent so that when
such an error is detected, it finds out where it is and then
constructs a plan to get back to where it was and resume the
old plan. Remember that sometimes getting back to where it was might
also fail! Show an example of the agent successfully overcoming two
successive motion errors and still reaching the goal.

4. Now try two different recovery schemes after an error: (1) head for
the closest vertex on the original route; and (2) replan a route to
the goal from the new location. Compare the performance of the three
recovery schemes. Would the inclusion of search costs affect the
comparison?

5. Now suppose that there are locations from which the view
is identical. (For example, suppose the world is a grid with
square obstacles.) What kind of problem does the agent now face?
What do solutions look like?

Suppose that an agent is in a $3 \times 3$
maze environment like the one shown in
Figure maze-3x3-figure. The agent knows that its
initial location is (1,1), that the goal is at (3,3), and that the
actions *Up*, *Down*, *Left*, *Right* have their usual
effects unless blocked by a wall. The agent does *not* know
where the internal walls are. In any given state, the agent perceives
the set of legal actions; it can also tell whether the state is one it
has visited before.

1. Explain how this online search problem can be viewed as an offline
search in belief-state space, where the initial belief state
includes all possible environment configurations. How large is the
initial belief state? How large is the space of belief states?

2. How many distinct percepts are possible in the initial state?

3. Describe the first few branches of a contingency plan for this
problem. How large (roughly) is the complete plan?

Notice that this contingency plan is a solution for *every
possible environment* fitting the given description. Therefore,
interleaving of search and execution is not strictly necessary even in
unknown environments.

Suppose that an agent is in a $3 \times 3$
maze environment like the one shown in
Figure maze-3x3-figure. The agent knows that its
initial location is (3,3), that the goal is at (1,1), and that the four
actions *Up*, *Down*, *Left*, *Right* have their usual
effects unless blocked by a wall. The agent does *not* know
where the internal walls are. In any given state, the agent perceives
the set of legal actions; it can also tell whether the state is one it
has visited before or is a new state.

1. Explain how this online search problem can be viewed as an offline
search in belief-state space, where the initial belief state
includes all possible environment configurations. How large is the
initial belief state? How large is the space of belief states?

2. How many distinct percepts are possible in the initial state?

3. Describe the first few branches of a contingency plan for this
problem. How large (roughly) is the complete plan?

Notice that this contingency plan is a solution for *every
possible environment* fitting the given description. Therefore,
interleaving of search and execution is not strictly necessary even in
unknown environments.

In this exercise, we examine hill climbing
in the context of robot navigation, using the environment in
Figure geometric-scene-figure as an example.

1. Repeat Exercise path-planning-agent-exercise using
hill climbing. Does your agent ever get stuck in a local minimum? Is
it *possible* for it to get stuck with convex
obstacles?

2. Construct a nonconvex polygonal environment in which the agent
gets stuck.

3. Modify the hill-climbing algorithm so that, instead of doing a
depth-1 search to decide where to go next, it does a
depth-$k$ search. It should find the best $k$-step path and do one
step along it, and then repeat the process.

4. Is there some $k$ for which the new algorithm is guaranteed to
escape from local minima?

5. Explain how LRTA enables the agent to escape from local minima in
this case.

Like DFS, online DFS is incomplete for reversible state spaces with infinite paths. For example, suppose that states are points on the infinite two-dimensional grid and actions are unit vectors $(1,0)$, $(0,1)$, $(-1,0)$, $(0,-1)$, tried in that order. Show that online DFS starting at $(0,0)$ will not reach $(1,-1)$. Suppose the agent can observe, in addition to its current state, all successor states and the actions that would lead to them. Write an algorithm that is complete even for bidirected state spaces with infinite paths. What states does it visit in reaching $(1,-1)$?

Relate the time complexity of LRTA* to its space complexity.

Suppose you have an oracle, $OM(s)$, that correctly predicts the opponent’s move in any state. Using this, formulate the definition of a game as a (single-agent) search problem. Describe an algorithm for finding the optimal move.

Consider the problem of solving two 8-puzzles.

1. Give a complete problem formulation in the style of
Chapter search-chapter.

2. How large is the reachable state space? Give an exact
numerical expression.

3. Suppose we make the problem adversarial as follows: the two players
take turns moving; a coin is flipped to determine the puzzle on
which to make a move in that turn; and the winner is the first to
solve one puzzle. Which algorithm can be used to choose a move in
this setting?

4. Does the game eventually end, given optimal play? Explain.

(a) A map where the cost of every edge is 1. Initially the pursuer $P$ is at
node **b** and the evader $E$ is at node **d**

(b) A partial game tree for this map.
Each node is labeled with the $P,E$ positions. $P$ moves first. Branches marked "?" have yet to be explored.

Imagine that, in Exercise two-friends-exercise, one of
the friends wants to avoid the other. The problem then becomes a
two-player game. We assume now that the players take turns moving. The
game ends only when the players are on the same node; the terminal
payoff to the pursuer is minus the total time taken. (The evader “wins”
by never losing.) An example is shown in Figure.
pursuit-evasion-game-figure

1. Copy the game tree and mark the values of the terminal nodes.

2. Next to each internal node, write the strongest fact you can infer
about its value (a number, one or more inequalities such as
“$\geq 14$”, or a “?”).

3. Beneath each question mark, write the name of the node reached by
that branch.

4. Explain how a bound on the value of the nodes in (c) can be derived
from consideration of shortest-path lengths on the map, and derive
such bounds for these nodes. Remember the cost to get to each leaf
as well as the cost to solve it.

5. Now suppose that the tree as given, with the leaf bounds from (d),
is evaluated from left to right. Circle those “?” nodes that would
*not* need to be expanded further, given the bounds
from part (d), and cross out those that need not be considered
at all.

6. Can you prove anything in general about who wins the game on a map
that is a tree?

Describe and implement state descriptions, move generators, terminal tests, utility functions, and evaluation functions for one or more of the following stochastic games: Monopoly, Scrabble, bridge play with a given contract, or Texas hold’em poker.

Describe and implement a *real-time*,
*multiplayer* game-playing environment, where time is part
of the environment state and players are given fixed time allocations.

Discuss how well the standard approach to game playing would apply to games such as tennis, pool, and croquet, which take place in a continuous physical state space.

Prove the following assertion: For every
game tree, the utility obtained by max using minimax
decisions against a suboptimal min will never be lower than
the utility obtained playing against an optimal min. Can
you come up with a game tree in which max can do still
better using a *suboptimal* strategy against a suboptimal
min?

Player $A$ moves first. The two players take turns moving, and each
player must move his token to an open adjacent space in either
direction. If the opponent occupies an adjacent space, then a player
may jump over the opponent to the next open space if any. (For
example, if $A$ is on 3 and $B$ is on 2, then $A$ may move back to 1.)
The game ends when one player reaches the opposite end of the board.
If player $A$ reaches space 4 first, then the value of the game to $A$
is $+1$; if player $B$ reaches space 1 first, then the value of the
game to $A$ is $-1$.

Consider the two-player game described in
Figure line-game4-figure

1. Draw the complete game tree, using the following conventions:

- Write each state as $(s_A,s_B)$, where $s_A$ and $s_B$ denote
the token locations.

- Put each terminal state in a square box and write its game value
in a circle.

- Put *loop states* (states that already appear on
the path to the root) in double square boxes. Since their value
is unclear, annotate each with a “?” in a circle.

2. Now mark each node with its backed-up minimax value (also in
a circle). Explain how you handled the “?” values and why.

3. Explain why the standard minimax algorithm would fail on this game
tree and briefly sketch how you might fix it, drawing on your answer
to (b). Does your modified algorithm give optimal decisions for all
games with loops?

4. This 4-square game can be generalized to $n$ squares for any
$n > 2$. Prove that $A$ wins if $n$ is even and loses if $n$ is odd.

This problem exercises the basic concepts of game playing, using
tic-tac-toe (noughts and crosses) as an example. We define
$X_n$ as the number of rows, columns, or diagonals with exactly $n$
$X$’s and no $O$’s. Similarly, $O_n$ is the number of rows, columns, or
diagonals with just $n$ $O$’s. The utility function assigns $+1$ to any
position with $X_3=1$ and $-1$ to any position with $O_3 = 1$. All other
terminal positions have utility 0. For nonterminal positions, we use a
linear evaluation function defined as ${Eval}(s) = 3X_2(s) + X_1(s) -
(3O_2(s) + O_1(s))$.

1. Approximately how many possible games of tic-tac-toe are there?

2. Show the whole game tree starting from an empty board down to depth
2 (i.e., one $X$ and one $O$ on the board), taking symmetry
into account.

3. Mark on your tree the evaluations of all the positions at depth 2.

4. Using the minimax algorithm, mark on your tree the backed-up values
for the positions at depths 1 and 0, and use those values to choose
the best starting move.

5. Circle the nodes at depth 2 that would *not* be
evaluated if alpha–beta pruning were applied, assuming the nodes are
generated in the optimal order for alpha–beta pruning.

Consider the family of generalized tic-tac-toe games, defined as
follows. Each particular game is specified by a set $\mathcal S$ of
*squares* and a collection $\mathcal W$ of *winning
positions.* Each winning position is a subset of $\mathcal S$.
For example, in standard tic-tac-toe, $\mathcal S$ is a set of 9 squares
and $\mathcal W$ is a collection of 8 subsets of $\cal W$: the three
rows, the three columns, and the two diagonals. In other respects, the
game is identical to standard tic-tac-toe. Starting from an empty board,
players alternate placing their marks on an empty square. A player who
marks every square in a winning position wins the game. It is a tie if
all squares are marked and neither player has won.

1. Let $N= |{\mathcal S}|$, the number of squares. Give an upper bound
on the number of nodes in the complete game tree for generalized
tic-tac-toe as a function of $N$.

2. Give a lower bound on the size of the game tree for the worst case,
where ${\mathcal W} = {\{\,\}}$.

3. Propose a plausible evaluation function that can be used for any
instance of generalized tic-tac-toe. The function may depend on
$\mathcal S$ and $\mathcal W$.

4. Assume that it is possible to generate a new board and check whether
it is a winning position in 100$N$ machine instructions and assume a
2 gigahertz processor. Ignore memory limitations. Using your
estimate in (a), roughly how large a game tree can be completely
solved by alpha–beta in a second of CPU time? a minute? an hour?

Develop a general game-playing program, capable of playing a variety of
games.

1. Implement move generators and evaluation functions for one or more
of the following games: Kalah, Othello, checkers, and chess.

2. Construct a general alpha–beta game-playing agent.

3. Compare the effect of increasing search depth, improving move
ordering, and improving the evaluation function. How close does your
effective branching factor come to the ideal case of perfect move
ordering?

4. Implement a selective search algorithm, such as B\* Berliner:1979,
conspiracy number search @McAllester:1988, or MGSS\*
Russell+Wefald:1989 and compare its performance to A\*.

Describe how the minimax and alpha–beta algorithms change for two-player, non-zero-sum games in which each player has a distinct utility function and both utility functions are known to both players. If there are no constraints on the two terminal utilities, is it possible for any node to be pruned by alpha–beta? What if the player’s utility functions on any state differ by at most a constant $k$, making the game almost cooperative?

Describe how the minimax and alpha–beta algorithms change for two-player, non-zero-sum games in which each player has a distinct utility function and both utility functions are known to both players. If there are no constraints on the two terminal utilities, is it possible for any node to be pruned by alpha–beta? What if the player’s utility functions on any state sum to a number between constants $-k$ and $k$, making the game almost zero-sum?

Develop a formal proof of correctness for alpha–beta pruning. To do
this, consider the situation shown in
Figure alpha-beta-proof-figure. The question is whether
to prune node $n_j$, which is a max-node and a descendant of node $n_1$.
The basic idea is to prune it if and only if the minimax value of $n_1$
can be shown to be independent of the value of $n_j$.

1. Mode $n_1$ takes on the minimum value among its children:
$n_1 = \min(n_2,n_21,\ldots,n_{2b_2})$. Find a similar
expression for $n_2$ and hence an expression for $n_1$ in terms of
$n_j$.

2. Let $l_i$ be the minimum (or maximum) value of the nodes to the
*left* of node $n_i$ at depth $i$, whose minimax value
is already known. Similarly, let $r_i$ be the minimum (or maximum)
value of the unexplored nodes to the right of $n_i$ at depth $i$.
Rewrite your expression for $n_1$ in terms of the $l_i$ and
$r_i$ values.

3. Now reformulate the expression to show that in order to affect
$n_1$, $n_j$ must not exceed a certain bound derived from the
$l_i$ values.

4. Repeat the process for the case where $n_j$ is a min-node.

Prove that the alpha–beta algorithm takes time $O(b^{m/2})$ with optimal move ordering, where $m$ is the maximum depth of the game tree.

Suppose you have a chess program that can evaluate 5 million nodes per second. Decide on a compact representation of a game state for storage in a transposition table. About how many entries can you fit in a 1-gigabyte in-memory table? Will that be enough for the three minutes of search allocated for one move? How many table lookups can you do in the time it would take to do one evaluation? Now suppose the transposition table is stored on disk. About how many evaluations could you do in the time it takes to do one disk seek with standard disk hardware?

Suppose you have a chess program that can evaluate 10 million nodes per
second. Decide on a compact representation of a game state for storage
in a transposition table. About how many entries can you fit in a
2-gigabyte in-memory table? Will that be enough for the three minutes of
search allocated for one move? How many table lookups can you do in the
time it would take to do one evaluation? Now suppose the transposition
table is stored on disk. About how many evaluations could you do in the
time it takes to do one disk seek with standard disk hardware?

This question considers pruning in games with chance nodes.
Figure trivial-chance-game-figure shows the complete
game tree for a trivial game. Assume that the leaf nodes are to be
evaluated in left-to-right order, and that before a leaf node is
evaluated, we know nothing about its value—the range of possible values
is $-\infty$ to $\infty$.

1. Copy the figure, mark the value of all the internal nodes, and
indicate the best move at the root with an arrow.

2. Given the values of the first six leaves, do we need to evaluate the
seventh and eighth leaves? Given the values of the first seven
leaves, do we need to evaluate the eighth leaf? Explain
your answers.

3. Suppose the leaf node values are known to lie between –2 and 2
inclusive. After the first two leaves are evaluated, what is the
value range for the left-hand chance node?

4. Circle all the leaves that need not be evaluated under the
assumption in (c).

Implement the expectiminimax algorithm and the \*-alpha–beta algorithm, which is described by Ballard:1983, for pruning game trees with chance nodes. Try them on a game such as backgammon and measure the pruning effectiveness of \*-alpha–beta.

Prove that with a positive linear transformation of leaf values (i.e., transforming a value $x$ to $ax + b$ where $a > 0$), the choice of move remains unchanged in a game tree, even when there are chance nodes.

Consider the following procedure
for choosing moves in games with chance nodes:

- Generate some dice-roll sequences (say, 50) down to a suitable depth
(say, 8).

- With known dice rolls, the game tree becomes deterministic. For each
dice-roll sequence, solve the resulting deterministic game tree
using alpha–beta.

- Use the results to estimate the value of each move and to choose
the best.

Will this procedure work well? Why (or why not)?

In the following, a “max” tree consists only of max nodes, whereas an
“expectimax” tree consists of a max node at the root with alternating
layers of chance and max nodes. At chance nodes, all outcome
probabilities are nonzero. The goal is to *find the value of the
root* with a bounded-depth search. For each of (a)–(f), either
give an example or explain why this is impossible.

1. Assuming that leaf values are finite but unbounded, is pruning (as
in alpha–beta) ever possible in a max tree?

2. Is pruning ever possible in an expectimax tree under the same
conditions?

3. If leaf values are all nonnegative, is pruning ever possible in a
max tree? Give an example, or explain why not.

4. If leaf values are all nonnegative, is pruning ever possible in an
expectimax tree? Give an example, or explain why not.

5. If leaf values are all in the range $[0,1]$, is pruning ever
possible in a max tree? Give an example, or explain why not.

6. If leaf values are all in the range $[0,1]$, is pruning ever
possible in an expectimax tree?1

7. Consider the outcomes of a chance node in an expectimax tree. Which
of the following evaluation orders is most likely to yield pruning
opportunities?

i. Lowest probability first

ii. Highest probability first

iii. Doesn’t make any difference

In the following, a “max” tree consists only of max nodes, whereas an
“expectimax” tree consists of a max node at the root with alternating
layers of chance and max nodes. At chance nodes, all outcome
probabilities are nonzero. The goal is to *find the value of the
root* with a bounded-depth search.

1. Assuming that leaf values are finite but unbounded, is pruning (as
in alpha–beta) ever possible in a max tree? Give an example, or
explain why not.

2. Is pruning ever possible in an expectimax tree under the same
conditions? Give an example, or explain why not.

3. If leaf values are constrained to be in the range $[0,1]$, is
pruning ever possible in a max tree? Give an example, or explain
why not.

4. If leaf values are constrained to be in the range $[0,1]$, is
pruning ever possible in an expectimax tree? Give an example
(qualitatively different from your example in (e), if any), or
explain why not.

5. If leaf values are constrained to be nonnegative, is pruning ever
possible in a max tree? Give an example, or explain why not.

6. If leaf values are constrained to be nonnegative, is pruning ever
possible in an expectimax tree? Give an example, or explain why not.

7. Consider the outcomes of a chance node in an expectimax tree. Which
of the following evaluation orders is most likely to yield pruning
opportunities: (i) Lowest probability first; (ii) Highest
probability first; (iii) Doesn’t make any difference?

Suppose you have an oracle, $OM(s)$, that correctly predicts the opponent’s move in any state. Using this, formulate the definition of a game as a (single-agent) search problem. Describe an algorithm for finding the optimal move.

Consider carefully the interplay of chance events and partial
information in each of the games in
Exercise game-playing-chance-exercise.

1. For which is the standard expectiminimax model appropriate?
Implement the algorithm and run it in your game-playing agent, with
appropriate modifications to the game-playing environment.

2. For which would the scheme described in
Exercise game-playing-monte-carlo-exercise be
appropriate?

3. Discuss how you might deal with the fact that in some of the games,
the players do not have the same knowledge of the current state.

How many solutions are there for the map-coloring problem in Figure australia-figure? How many solutions if four colors are allowed? Two colors?

Consider the problem of placing $k$ knights on an $n\times n$
chessboard such that no two knights are attacking each other, where $k$
is given and $k\leq n^2$.

1. Choose a CSP formulation. In your formulation, what are the
variables?

2. What are the possible values of each variable?

3. What sets of variables are constrained, and how?

4. Now consider the problem of putting *as many knights as
possible* on the board without any attacks. Explain how to
solve this with local search by defining appropriate ACTIONS and RESULT functions
and a sensible objective function.

Consider the problem of constructing (not solving)
crossword puzzles fitting words into a rectangular grid. The grid,
which is given as part of the problem, specifies which squares are blank
and which are shaded. Assume that a list of words (i.e., a dictionary)
is provided and that the task is to fill in the blank squares by using
any subset of the list. Formulate this problem precisely in two ways:

1. As a general search problem. Choose an appropriate search algorithm
and specify a heuristic function. Is it better to fill in blanks one
letter at a time or one word at a time?

2. As a constraint satisfaction problem. Should the variables be words
or letters?

Which formulation do you think will be better? Why?

Give precise formulations for each of the
following as constraint satisfaction problems:

1. Rectilinear floor-planning: find non-overlapping places in a large
rectangle for a number of smaller rectangles.

2. Class scheduling: There is a fixed number of professors and
classrooms, a list of classes to be offered, and a list of possible
time slots for classes. Each professor has a set of classes that he
or she can teach.

3. Hamiltonian tour: given a network of cities connected by roads,
choose an order to visit all cities in a country without
repeating any.

Solve the cryptarithmetic problem in Figure cryptarithmetic-figure by hand, using the strategy of backtracking with forward checking and the MRV and least-constraining-value heuristics.

Show how a single ternary constraint such as “$A + B = C$” can be turned into three binary constraints by using an auxiliary variable. You may assume finite domains. (*Hint:* Consider a new variable that takes on values that are pairs of other values, and consider constraints such as “$X$ is the first element of the pair $Y$.”) Next, show how constraints with more than three variables can be treated similarly. Finally, show how unary constraints can be eliminated by altering the domains of variables. This completes the demonstration that any CSP can be transformed into a CSP with only binary constraints.

Consider the following logic puzzle: In five houses,
each with a different color, live five persons of different
nationalities, each of whom prefers a different brand of candy, a
different drink, and a different pet. Given the following facts, the
questions to answer are “Where does the zebra live, and in which house
do they drink water?”

The Englishman lives in the red house.

The Spaniard owns the dog.

The Norwegian lives in the first house on the left.

The green house is immediately to the right of the ivory house.

The man who eats Hershey bars lives in the house next to the man with
the fox.

Kit Kats are eaten in the yellow house.

The Norwegian lives next to the blue house.

The Smarties eater owns snails.

The Snickers eater drinks orange juice.

The Ukrainian drinks tea.

The Japanese eats Milky Ways.

Kit Kats are eaten in a house next to the house where the horse is kept.

Coffee is drunk in the green house.

Milk is drunk in the middle house.

Discuss different representations of this problem as a CSP. Why would
one prefer one representation over another?

Consider the graph with 8 nodes $A_1$, $A_2$, $A_3$, $A_4$, $H$, $T$, $F_1$, $F_2$. $A_i$ is connected to $A_{i+1}$ for all $i$, each $A_i$ is connected to $H$, $H$ is connected to $T$, and $T$ is connected to each $F_i$. Find a 3-coloring of this graph by hand using the following strategy: backtracking with conflict-directed backjumping, the variable order $A_1$, $H$, $A_4$, $F_1$, $A_2$, $F_2$, $A_3$, $T$, and the value order $R$, $G$, $B$.

Explain why it is a good heuristic to choose the variable that is *most* constrained but the value that is *least* constraining in a CSP search.

Generate random instances of map-coloring problems as follows: scatter $n$ points on the unit square; select a point $X$ at random, connect $X$ by a straight line to the nearest point $Y$ such that $X$ is not already connected to $Y$ and the line crosses no other line; repeat the previous step until no more connections are possible. The points represent regions on the map and the lines connect neighbors. Now try to find $k$-colorings of each map, for both $k3$ and $k4$, using min-conflicts, backtracking, backtracking with forward checking, and backtracking with MAC. Construct a table of average run times for each algorithm for values of $n$ up to the largest you can manage. Comment on your results.

Use the AC-3 algorithm to show that arc consistency can detect the inconsistency of the partial assignment ${green},V{red}$ for the problem shown in Figure australia-figure.

Use the AC-3 algorithm to show that arc consistency can detect the inconsistency of the partial assignment ${red},V{blue}$ for the problem shown in Figure australia-figure.

What is the worst-case complexity of running AC-3 on a tree-structured CSP?

AC-3 puts back on the queue *every* arc
($X_{k}, X_{i}$) whenever *any* value is deleted from the
domain of $X_{i}$, even if each value of $X_{k}$ is consistent with
several remaining values of $X_{i}$. Suppose that, for every arc
($X_{k}, X_{i}$), we keep track of the number of remaining values of
$X_{i}$ that are consistent with each value of $X_{k}$. Explain how to
update these numbers efficiently and hence show that arc consistency can
be enforced in total time $O(n^2d^2)$.

The Tree-CSP-Solver (Figure tree-csp-figure) makes arcs consistent starting at the leaves and working backwards towards the root. Why does it do that? What would happen if it went in the opposite direction?

We introduced Sudoku as a CSP to be solved by search over partial assignments because that is the way people generally undertake solving Sudoku problems. It is also possible, of course, to attack these problems with local search over complete assignments. How well would a local solver using the min-conflicts heuristic do on Sudoku problems?

Define in your own words the terms constraint, backtracking search, arc consistency, backjumping, min-conflicts, and cycle cutset.

Define in your own words the terms constraint, commutativity, arc consistency, backjumping, min-conflicts, and cycle cutset.

Suppose that a graph is known to have a cycle cutset of no more than $k$ nodes. Describe a simple algorithm for finding a minimal cycle cutset whose run time is not much more than $O(n^k)$ for a CSP with $n$ variables. Search the literature for methods for finding approximately minimal cycle cutsets in time that is polynomial in the size of the cutset. Does the existence of such algorithms make the cycle cutset method practical?

Consider the problem of tiling a surface (completely and exactly
covering it) with $n$ dominoes ($2\times
1$ rectangles). The surface is an arbitrary edge-connected (i.e.,
adjacent along an edge, not just a corner) collection of $2n$
$1\times 1$ squares (e.g., a checkerboard, a checkerboard with some
squares missing, a $10\times 1$ row of squares, etc.).

1. Formulate this problem precisely as a CSP where the dominoes are
the variables.

2. Formulate this problem precisely as a CSP where the squares are the
variables, keeping the state space as small as possible.
(*Hint:* does it matter which particular domino goes on
a given pair of squares?)

3. Construct a surface consisting of 6 squares such that your CSP
formulation from part (b) has a *tree-structured*
constraint graph.

4. Describe exactly the set of solvable instances that have a
tree-structured constraint graph.

Suppose the agent has progressed to the point shown in
Figure wumpus-seq35-figure(a), page wumpus-seq35-figure,
having perceived nothing in [1,1], a breeze in [2,1], and a stench
in [1,2], and is now concerned with the contents of [1,3], [2,2],
and [3,1]. Each of these can contain a pit, and at most one can
contain a wumpus. Following the example of
Figure wumpus-entailment-figure, construct the set of
possible worlds. (You should find 32 of them.) Mark the worlds in which
the KB is true and those in which each of the following sentences is
true:

$\alpha_2$ = “There is no pit in [2,2].”

$\alpha_3$ = “There is a wumpus in [1,3].”

Hence show that ${KB} {\models}\alpha_2$ and
${KB} {\models}\alpha_3$.

(Adapted from Barwise+Etchemendy:1993 .) Given the following, can you prove that the unicorn is
mythical? How about magical? Horned?

Note: If the unicorn is mythical, then it is immortal, but if it is not
mythical, then it is a mortal mammal. If the unicorn is either
immortal or a mammal, then it is horned. The unicorn is magical if it
is horned.

Consider the problem of deciding whether a
propositional logic sentence is true in a given model.

1. Write a recursive algorithm PL-True?$ (s, m )$ that returns ${true}$ if and
only if the sentence $s$ is true in the model $m$ (where $m$ assigns
a truth value for every symbol in $s$). The algorithm should run in
time linear in the size of the sentence. (Alternatively, use a
version of this function from the online code repository.)

2. Give three examples of sentences that can be determined to be true
or false in a *partial* model that does not specify a
truth value for some of the symbols.

3. Show that the truth value (if any) of a sentence in a partial model
cannot be determined efficiently in general.

4. Modify your algorithm so that it can sometimes judge truth from
partial models, while retaining its recursive structure and linear
run time. Give three examples of sentences whose truth in a partial
model is *not* detected by your algorithm.

5. Investigate whether the modified algorithm makes $TT-Entails?$ more efficient.

Which of the following are correct?

1. ${False} \models {True}$.

2. ${True} \models {False}$.

3. $(A\land B) \models (A{\;\;{\Leftrightarrow}\;\;}B)$.

4. $A{\;\;{\Leftrightarrow}\;\;}B \models A \lor B$.

5. $A{\;\;{\Leftrightarrow}\;\;}B \models \lnot A \lor B$.

6. $(A\land B){\:\;{\Rightarrow}\:\;}C \models (A{\:\;{\Rightarrow}\:\;}C)\lor(B{\:\;{\Rightarrow}\:\;}C)$.

7. $(C\lor (\lnot A \land \lnot B)) \equiv ((A{\:\;{\Rightarrow}\:\;}C) \land (B {\:\;{\Rightarrow}\:\;}C))$.

8. $(A\lor B) \land (\lnot C\lor\lnot D\lor E) \models (A\lor B)$.

9. $(A\lor B) \land (\lnot C\lor\lnot D\lor E) \models (A\lor B) \land (\lnot D\lor E)$.

10. $(A\lor B) \land \lnot(A {\:\;{\Rightarrow}\:\;}B)$ is satisfiable.

11. $(A{\;\;{\Leftrightarrow}\;\;}B) \land (\lnot A \lor B)$
is satisfiable.

12. $(A{\;\;{\Leftrightarrow}\;\;}B) {\;\;{\Leftrightarrow}\;\;}C$ has
the same number of models as $(A{\;\;{\Leftrightarrow}\;\;}B)$ for
any fixed set of proposition symbols that includes $A$, $B$, $C$.

Which of the following are correct?

1. ${False} \models {True}$.

2. ${True} \models {False}$.

3. $(A\land B) \models (A{\;\;{\Leftrightarrow}\;\;}B)$.

4. $A{\;\;{\Leftrightarrow}\;\;}B \models A \lor B$.

5. $A{\;\;{\Leftrightarrow}\;\;}B \models \lnot A \lor B$.

6. $(A\lor B) \land (\lnot C\lor\lnot D\lor E) \models (A\lor B\lor C) \land (B\land C\land D{\:\;{\Rightarrow}\:\;}E)$.

7. $(A\lor B) \land (\lnot C\lor\lnot D\lor E) \models (A\lor B) \land (\lnot D\lor E)$.

8. $(A\lor B) \land \lnot(A {\:\;{\Rightarrow}\:\;}B)$ is satisfiable.

9. $(A\land B){\:\;{\Rightarrow}\:\;}C \models (A{\:\;{\Rightarrow}\:\;}C)\lor(B{\:\;{\Rightarrow}\:\;}C)$.

10. $(C\lor (\lnot A \land \lnot B)) \equiv ((A{\:\;{\Rightarrow}\:\;}C) \land (B {\:\;{\Rightarrow}\:\;}C))$.

11. $(A{\;\;{\Leftrightarrow}\;\;}B) \land (\lnot A \lor B)$
is satisfiable.

12. $(A{\;\;{\Leftrightarrow}\;\;}B) {\;\;{\Leftrightarrow}\;\;}C$ has
the same number of models as $(A{\;\;{\Leftrightarrow}\;\;}B)$ for
any fixed set of proposition symbols that includes $A$, $B$, $C$.

Prove each of the following assertions:

1. $\alpha$ is valid if and only if ${True}{\models}\alpha$.

2. For any $\alpha$, ${False}{\models}\alpha$.

3. $\alpha{\models}\beta$ if and only if the sentence
$(\alpha {\:\;{\Rightarrow}\:\;}\beta)$ is valid.

4. $\alpha \equiv \beta$ if and only if the sentence
$(\alpha{\;\;{\Leftrightarrow}\;\;}\beta)$ is valid.

5. $\alpha{\models}\beta$ if and only if the sentence
$(\alpha \land \lnot \beta)$ is unsatisfiable.

Prove, or find a counterexample to, each of the following assertions:

1. If $\alpha\models\gamma$ or $\beta\models\gamma$ (or both) then
$(\alpha\land \beta)\models\gamma$

2. If $(\alpha\land \beta)\models\gamma$ then $\alpha\models\gamma$ or
$\beta\models\gamma$ (or both).

3. If $\alpha\models (\beta \lor \gamma)$ then $\alpha \models \beta$
or $\alpha \models \gamma$ (or both).

Prove, or find a counterexample to, each of the following assertions:

1. If $\alpha\models\gamma$ or $\beta\models\gamma$ (or both) then
$(\alpha\land \beta)\models\gamma$

2. If $\alpha\models (\beta \land \gamma)$ then $\alpha \models \beta$
and $\alpha \models \gamma$.

3. If $\alpha\models (\beta \lor \gamma)$ then $\alpha \models \beta$
or $\alpha \models \gamma$ (or both).

Consider a vocabulary with only four propositions, $A$, $B$, $C$, and
$D$. How many models are there for the following sentences?

1. $B\lor C$.

2. $\lnot A\lor \lnot B \lor \lnot C \lor \lnot D$.

3. $(A{\:\;{\Rightarrow}\:\;}B) \land A \land \lnot B \land C \land D$.

We have defined four binary logical connectives.

1. Are there any others that might be useful?

2. How many binary connectives can there be?

3. Why are some of them not very useful?

Using a method of your choice, verify each of the equivalences in Table logical-equivalence-table (page logical-equivalence-table).

Decide whether each of the following
sentences is valid, unsatisfiable, or neither. Verify your decisions
using truth tables or the equivalence rules of
Table logical-equivalence-table (page logical-equivalence-table).
1. ${Smoke} {\:\;{\Rightarrow}\:\;}{Smoke}$

2. ${Smoke} {\:\;{\Rightarrow}\:\;}{Fire}$

3. $({Smoke} {\:\;{\Rightarrow}\:\;}{Fire}) {\:\;{\Rightarrow}\:\;}(\lnot {Smoke} {\:\;{\Rightarrow}\:\;}\lnot {Fire})$

4. ${Smoke} \lor {Fire} \lor \lnot {Fire}$

5. $(({Smoke} \land {Heat}) {\:\;{\Rightarrow}\:\;}{Fire})
{\;\;{\Leftrightarrow}\;\;}(({Smoke} {\:\;{\Rightarrow}\:\;}{Fire}) \lor ({Heat} {\:\;{\Rightarrow}\:\;}{Fire}))$

6. $({Smoke} {\:\;{\Rightarrow}\:\;}{Fire}) {\:\;{\Rightarrow}\:\;}(({Smoke} \land {Heat}) {\:\;{\Rightarrow}\:\;}{Fire}) $

7. ${Big} \lor {Dumb} \lor ({Big} {\:\;{\Rightarrow}\:\;}{Dumb})$

Decide whether each of the following
sentences is valid, unsatisfiable, or neither. Verify your decisions
using truth tables or the equivalence rules of
Table logical-equivalence-table (page logical-equivalence-table).

1. ${Smoke} {\:\;{\Rightarrow}\:\;}{Smoke}$

2. ${Smoke} {\:\;{\Rightarrow}\:\;}{Fire}$

3. $({Smoke} {\:\;{\Rightarrow}\:\;}{Fire}) {\:\;{\Rightarrow}\:\;}(\lnot {Smoke} {\:\;{\Rightarrow}\:\;}\lnot {Fire})$

4. ${Smoke} \lor {Fire} \lor \lnot {Fire}$

5. $(({Smoke} \land {Heat}) {\:\;{\Rightarrow}\:\;}{Fire})
{\;\;{\Leftrightarrow}\;\;}(({Smoke} {\:\;{\Rightarrow}\:\;}{Fire}) \lor ({Heat} {\:\;{\Rightarrow}\:\;}{Fire}))$

6. ${Big} \lor {Dumb} \lor ({Big} {\:\;{\Rightarrow}\:\;}{Dumb})$

7. $({Big} \land {Dumb}) \lor \lnot {Dumb}$

Any propositional logic sentence is logically equivalent to the assertion that each possible world in which it would be false is not the case. From this observation, prove that any sentence can be written in CNF.

Use resolution to prove the sentence $\lnot A \land \lnot B$ from the clauses in Exercise convert-clausal-exercise.

This exercise looks into the relationship between
clauses and implication sentences.

1. Show that the clause $(\lnot P_1 \lor \cdots \lor \lnot P_m \lor Q)$
is logically equivalent to the implication sentence
$(P_1 \land \cdots \land P_m) {\;{\Rightarrow}\;}Q$.

2. Show that every clause (regardless of the number of
positive literals) can be written in the form
$(P_1 \land \cdots \land P_m) {\;{\Rightarrow}\;}(Q_1 \lor \cdots \lor Q_n)$,
where the $P$s and $Q$s are proposition symbols. A knowledge base
consisting of such sentences is in implicative normal form or **Kowalski
form** Kowalski:1979.

3. Write down the full resolution rule for sentences in implicative
normal form.

According to some political pundits, a person who is radical ($R$) is
electable ($E$) if he/she is conservative ($C$), but otherwise is not
electable.

1. Which of the following are correct representations of this
assertion?

1. $(R\land E)\iff C$

2. $R{\:\;{\Rightarrow}\:\;}(E\iff C)$

3. $R{\:\;{\Rightarrow}\:\;}((C{\:\;{\Rightarrow}\:\;}E) \lor \lnot E)$

2. Which of the sentences in (a) can be expressed in Horn form?

This question considers representing satisfiability (SAT) problems as
CSPs.

1. Draw the constraint graph corresponding to the SAT problem
$$(\lnot X_1 \lor X_2) \land (\lnot X_2 \lor X_3) \land \ldots \land (\lnot X_{n-1} \lor X_n)$$
for the particular case $n5$.

2. How many solutions are there for this general SAT problem as a
function of $n$?

3. Suppose we apply {Backtracking-Search} (page backtracking-search-algorithm) to find *all*
solutions to a SAT CSP of the type given in (a). (To find
*all* solutions to a CSP, we simply modify the basic
algorithm so it continues searching after each solution is found.)
Assume that variables are ordered $X_1,\ldots,X_n$ and ${false}$
is ordered before ${true}$. How much time will the algorithm take
to terminate? (Write an $O(\cdot)$ expression as a function of $n$.)

4. We know that SAT problems in Horn form can be solved in linear time
by forward chaining (unit propagation). We also know that every
tree-structured binary CSP with discrete, finite domains can be
solved in time linear in the number of variables
(Section csp-structure-section). Are these two
facts connected? Discuss.

This question considers representing satisfiability (SAT) problems as
CSPs.

1. Draw the constraint graph corresponding to the SAT problem
$$(\lnot X_1 \lor X_2) \land (\lnot X_2 \lor X_3) \land \ldots \land (\lnot X_{n-1} \lor X_n)$$
for the particular case $n4$.

2. How many solutions are there for this general SAT problem as a
function of $n$?

3. Suppose we apply {Backtracking-Search} (page backtracking-search-algorithm) to find *all*
solutions to a SAT CSP of the type given in (a). (To find
*all* solutions to a CSP, we simply modify the basic
algorithm so it continues searching after each solution is found.)
Assume that variables are ordered $X_1,\ldots,X_n$ and ${false}$
is ordered before ${true}$. How much time will the algorithm take
to terminate? (Write an $O(\cdot)$ expression as a function of $n$.)

4. We know that SAT problems in Horn form can be solved in linear time
by forward chaining (unit propagation). We also know that every
tree-structured binary CSP with discrete, finite domains can be
solved in time linear in the number of variables
(Section csp-structure-section). Are these two
facts connected? Discuss.

Explain why every nonempty propositional clause, by itself, is satisfiable. Prove rigorously that every set of five 3-SAT clauses is satisfiable, provided that each clause mentions exactly three distinct variables. What is the smallest set of such clauses that is unsatisfiable? Construct such a set.

A propositional *2-CNF* expression is a conjunction of
clauses, each containing *exactly 2* literals, e.g.,
$$(A\lor B) \land (\lnot A \lor C) \land (\lnot B \lor D) \land (\lnot
C \lor G) \land (\lnot D \lor G)\ .$$

1. Prove using resolution that the above sentence entails $G$.

2. Two clauses are *semantically distinct* if they are not
logically equivalent. How many semantically distinct 2-CNF clauses
can be constructed from $n$ proposition symbols?

3. Using your answer to (b), prove that propositional resolution always
terminates in time polynomial in $n$ given a 2-CNF sentence
containing no more than $n$ distinct symbols.

4. Explain why your argument in (c) does not apply to 3-CNF.

Prove each of the following assertions:

1. Every pair of propositional clauses either has no resolvents, or all
their resolvents are logically equivalent.

2. There is no clause that, when resolved with itself, yields
(after factoring) the clause $(\lnot P \lor \lnot Q)$.

3. If a propositional clause $C$ can be resolved with a copy of itself,
it must be logically equivalent to $ True $.

Consider the following sentence:

$$[ ({Food} {\:\;{\Rightarrow}\:\;}{Party}) \lor ({Drinks} {\:\;{\Rightarrow}\:\;}{Party}) ] {\:\;{\Rightarrow}\:\;}[ ( {Food} \land {Drinks} ) {\:\;{\Rightarrow}\:\;}{Party}]\ .$$

1. Determine, using enumeration, whether this sentence is valid,
satisfiable (but not valid), or unsatisfiable.

2. Convert the left-hand and right-hand sides of the main implication
into CNF, showing each step, and explain how the results confirm
your answer to (a).

3. Prove your answer to (a) using resolution.

A sentence is in disjunctive normal form(DNF) if it is the disjunction of
conjunctions of literals. For example, the sentence
$(A \land B \land \lnot C) \lor (\lnot A \land C) \lor (B \land \lnot C)$
is in DNF.

1. Any propositional logic sentence is logically equivalent to the
assertion that some possible world in which it would be true is in
fact the case. From this observation, prove that any sentence can be
written in DNF.

2. Construct an algorithm that converts any sentence in propositional
logic into DNF. (*Hint*: The algorithm is similar to
the algorithm for conversion to CNF iven in
Sectio pl-resolution-section.)

3. Construct a simple algorithm that takes as input a sentence in DNF
and returns a satisfying assignment if one exists, or reports that
no satisfying assignment exists.

4. Apply the algorithms in (b) and (c) to the following set of
sentences:

$A {\Rightarrow} B$

$B {\Rightarrow} C$

$C {\Rightarrow} A$

5. Since the algorithm in (b) is very similar to the algorithm for
conversion to CNF, and since the algorithm in (c) is much simpler
than any algorithm for solving a set of sentences in CNF, why is
this technique not used in automated reasoning?

Convert the following set of sentences to
clausal form.

1. S1: $A {\;\;{\Leftrightarrow}\;\;}(B \lor E)$.

2. S2: $E {\:\;{\Rightarrow}\:\;}D$.

3. S3: $C \land F {\:\;{\Rightarrow}\:\;}\lnot B$.

4. S4: $E {\:\;{\Rightarrow}\:\;}B$.

5. S5: $B {\:\;{\Rightarrow}\:\;}F$.

6. S6: $B {\:\;{\Rightarrow}\:\;}C$

Give a trace of the execution of DPLL on the conjunction of these
clauses.

Convert the following set of sentences to
clausal form.

1. S1: $A {\;\;{\Leftrightarrow}\;\;}(B \lor E)$.

2. S2: $E {\:\;{\Rightarrow}\:\;}D$.

3. S3: $C \land F {\:\;{\Rightarrow}\:\;}\lnot B$.

4. S4: $E {\:\;{\Rightarrow}\:\;}B$.

5. S5: $B {\:\;{\Rightarrow}\:\;}F$.

6. S6: $B {\:\;{\Rightarrow}\:\;}C$

Give a trace of the execution of DPLL on the conjunction of these
clauses.

Is a randomly generated 4-CNF sentence with $n$ symbols and $m$ clauses more or less likely to be solvable than a randomly generated 3-CNF sentence with $n$ symbols and $m$ clauses? Explain.

Minesweeper, the well-known computer game, is
closely related to the wumpus world. A minesweeper world is
a rectangular grid of $N$ squares with $M$ invisible mines scattered
among them. Any square may be probed by the agent; instant death follows
if a mine is probed. Minesweeper indicates the presence of mines by
revealing, in each probed square, the *number* of mines
that are directly or diagonally adjacent. The goal is to probe every
unmined square.
1. Let $X_{i,j}$ be true iff square $[i,j]$ contains a mine. Write down
the assertion that exactly two mines are adjacent to \[1,1\] as a
sentence involving some logical combination of
$X_{i,j}$ propositions.
2. Generalize your assertion from (a) by explaining how to construct a
CNF sentence asserting that $k$ of $n$ neighbors contain mines.
3. Explain precisely how an agent can use {DPLL} to prove that a given square
does (or does not) contain a mine, ignoring the global constraint
that there are exactly $M$ mines in all.
4. Suppose that the global constraint is constructed from your method
from part (b). How does the number of clauses depend on $M$ and $N$?
Suggest a way to modify {DPLL} so that the global constraint does not need
to be represented explicitly.
5. Are any conclusions derived by the method in part (c) invalidated
when the global constraint is taken into account?
6. Give examples of configurations of probe values that induce
*long-range dependencies* such that the contents of a
given unprobed square would give information about the contents of a
far-distant square. (*Hint*: consider an
$N\times 1$ board.)

How long does it take to prove
${KB}{\models}\alpha$ using {DPLL} when $\alpha$ is a literal *already
contained in* ${KB}$? Explain.

Trace the behavior of {DPLL} on the knowledge base in Figure pl-horn-example-figure when trying to prove $Q$, and compare this behavior with that of the forward-chaining algorithm.

Write a successor-state axiom for the ${Locked}$ predicate, which applies to doors, assuming the only actions available are ${Lock}$ and ${Unlock}$.

Discuss what is meant by *optimal* behavior in the wumpus
world. Show that the {Hybrid-Wumpus-Agent} is not optimal, and suggest ways to improve it.

Suppose an agent inhabits a world with two states, $S$ and $\lnot S$,
and can do exactly one of two actions, $a$ and $b$. Action $a$ does
nothing and action $b$ flips from one state to the other. Let $S^t$ be
the proposition that the agent is in state $S$ at time $t$, and let
$a^t$ be the proposition that the agent does action $a$ at time $t$
(similarly for $b^t$).

1. Write a successor-state axiom for $S^{t+1}$.

2. Convert the sentence in (a) into CNF.

3. Show a resolution refutation proof that if the agent is in $\lnot S$
at time $t$ and does $a$, it will still be in $\lnot S$ at time
$t+1$.

Section successor-state-section provides some of the successor-state axioms required for the wumpus world. Write down axioms for all remaining fluent symbols.

Modify the {Hybrid-Wumpus-Agent} to use the 1-CNF logical state estimation method described on page 1cnf-belief-state-page. We noted on that page that such an agent will not be able to acquire, maintain, and use more complex beliefs such as the disjunction $P_{3,1}\lor P_{2,2}$. Suggest a method for overcoming this problem by defining additional proposition symbols, and try it out in the wumpus world. Does it improve the performance of the agent?

A logical knowledge base represents the world using a set of sentences
with no explicit structure. An **analogical**
representation, on the other hand, has physical structure that
corresponds directly to the structure of the thing represented. Consider
a road map of your country as an analogical representation of facts
about the country—it represents facts with a map language. The
two-dimensional structure of the map corresponds to the two-dimensional
surface of the area.

1. Give five examples of *symbols* in the map language.

2. An *explicit* sentence is a sentence that the creator
of the representation actually writes down. An
*implicit* sentence is a sentence that results from
explicit sentences because of properties of the analogical
representation. Give three examples each of *implicit*
and *explicit* sentences in the map language.

3. Give three examples of facts about the physical structure of your
country that cannot be represented in the map language.

4. Give two examples of facts that are much easier to express in the
map language than in first-order logic.

5. Give two other examples of useful analogical representations. What
are the advantages and disadvantages of each of these languages?

Consider a knowledge base containing just two sentences: $P(a)$ and $P(b)$. Does this knowledge base entail $\forall\,x\ P(x)$? Explain your answer in terms of models.

Is the sentence ${\exists\,x,y\;\;} xy$ valid? Explain.

Write down a logical sentence such that every world in which it is true contains exactly one object.

Write down a logical sentence such that every world in which it is true contains exactly two objects.

Consider a symbol vocabulary that contains $c$ constant symbols, $p_k$ predicate symbols of each arity $k$, and $f_k$ function symbols of each arity $k$, where $1\leq k\leq A$. Let the domain size be fixed at $D$. For any given model, each predicate or function symbol is mapped onto a relation or function, respectively, of the same arity. You may assume that the functions in the model allow some input tuples to have no value for the function (i.e., the value is the invisible object). Derive a formula for the number of possible models for a domain with $D$ elements. Don’t worry about eliminating redundant combinations.

Which of the following are valid (necessarily true) sentences?

1. $(\exists x\ xx) {\:\;{\Rightarrow}\:\;}({\forall\,y\;\;} \exists z\ yz)$.

2. ${\forall\,x\;\;} P(x) \lor \lnot P(x)$.

3. ${\forall\,x\;\;} {Smart}(x) \lor (xx)$.

Consider a version of the semantics for first-order logic in which models with empty domains are allowed. Give at least two examples of sentences that are valid according to the standard semantics but not according to the new semantics. Discuss which outcome makes more intuitive sense for your examples.

Does the fact $\lnot {Spouse}({George},{Laura})$ follow from the facts ${Jim}\neq {George}$ and ${Spouse}({Jim},{Laura})$? If so, give a proof; if not, supply additional axioms as needed. What happens if we use ${Spouse}$ as a unary function symbol instead of a binary predicate?

This exercise uses the function ${MapColor}$ and predicates
${In}(x,y)$, ${Borders}(x,y)$, and ${Country}(x)$, whose arguments
are geographical regions, along with constant symbols for various
regions. In each of the following we give an English sentence and a
number of candidate logical expressions. For each of the logical
expressions, state whether it (1) correctly expresses the English
sentence; (2) is syntactically invalid and therefore meaningless; or (3)
is syntactically valid but does not express the meaning of the English
sentence.

1. Paris and Marseilles are both in France.

1. ${In}({Paris} \land {Marseilles}, {France})$.

2. ${In}({Paris},{France}) \land {In}({Marseilles},{France})$.

3. ${In}({Paris},{France}) \lor {In}({Marseilles},{France})$.

2. There is a country that borders both Iraq and Pakistan.

1. ${\exists\,c\;\;}$
${Country}(c) \land {Border}(c,{Iraq}) \land {Border}(c,{Pakistan})$.

2. ${\exists\,c\;\;}$
${Country}(c) {\:\;{\Rightarrow}\:\;}[{Border}(c,{Iraq}) \land {Border}(c,{Pakistan})]$.

3. $[{\exists\,c\;\;}$
${Country}(c)] {\:\;{\Rightarrow}\:\;}[{Border}(c,{Iraq}) \land {Border}(c,{Pakistan})]$.

4. ${\exists\,c\;\;}$
${Border}({Country}(c),{Iraq} \land {Pakistan})$.

3. All countries that border Ecuador are in South America.

1. ${\forall\,c\;\;} Country(c) \land {Border}(c,{Ecuador}) {\:\;{\Rightarrow}\:\;}{In}(c,{SouthAmerica})$.

2. ${\forall\,c\;\;} {Country}(c) {\:\;{\Rightarrow}\:\;}[{Border}(c,{Ecuador}) {\:\;{\Rightarrow}\:\;}{In}(c,{SouthAmerica})]$.

3. ${\forall\,c\;\;} [{Country}(c) {\:\;{\Rightarrow}\:\;}{Border}(c,{Ecuador})] {\:\;{\Rightarrow}\:\;}{In}(c,{SouthAmerica})$.

4. ${\forall\,c\;\;} Country(c) \land {Border}(c,{Ecuador}) \land {In}(c,{SouthAmerica})$.

4. No region in South America borders any region in Europe.

1. $\lnot [{\exists\,c,d\;\;} {In}(c,{SouthAmerica}) \land {In}(d,{Europe}) \land {Borders}(c,d)]$.

2. ${\forall\,c,d\;\;} [{In}(c,{SouthAmerica}) \land {In}(d,{Europe})] {\:\;{\Rightarrow}\:\;}\lnot {Borders}(c,d)]$.

3. $\lnot {\forall\,c\;\;} {In}(c,{SouthAmerica}) {\:\;{\Rightarrow}\:\;}{\exists\,d\;\;} {In}(d,{Europe}) \land

\lnot {Borders}(c,d)$.
4. ${\forall\,c\;\;} {In}(c,{SouthAmerica}) {\:\;{\Rightarrow}\:\;}{\forall\,d\;\;} {In}(d,{Europe}) {\:\;{\Rightarrow}\:\;}\lnot {Borders}(c,d)$.

5. No two adjacent countries have the same map color.

1. ${\forall\,x,y\;\;} \lnot {Country}(x) \lor \lnot {Country}(y) \lor \lnot {Borders}(x,y) \lor {}$\
$\lnot ({MapColor}(x) = {MapColor}(y))$.

2. ${\forall\,x,y\;\;} ({Country}(x) \land {Country}(y) \land {Borders}(x,y) \land \lnot(x=y)) {\:\;{\Rightarrow}\:\;}{}$\
$\lnot ({MapColor}(x) = {MapColor}(y))$.

3. ${\forall\,x,y\;\;} {Country}(x) \land {Country}(y) \land {Borders}(x,y) \land {}$\
$\lnot ({MapColor}(x) = {MapColor}(y))$.

4. ${\forall\,x,y\;\;} ({Country}(x) \land {Country}(y) \land {Borders}(x,y) ) {\:\;{\Rightarrow}\:\;}{MapColor}(x\neq y)$.

Consider a vocabulary with the following symbols:

> ${Occupation}(p,o)$: Predicate. Person $p$ has occupation $o$.
> ${Customer}(p1,p2)$: Predicate. Person $p1$ is a customer of person $p2$.
> ${Boss}(p1,p2)$: Predicate. Person $p1$ is a boss of person $p2$.
> ${Doctor}$, $ {Surgeon}$, $ {Lawyer}$, $ {Actor}$: Constants denoting occupations.
> ${Emily}$, $ {Joe}$: Constants denoting people.
Use these symbols to write the following assertions in first-order
logic:

1. Emily is either a surgeon or a lawyer.

2. Joe is an actor, but he also holds another job.

3. All surgeons are doctors.

4. Joe does not have a lawyer (i.e., is not a customer of any lawyer).

5. Emily has a boss who is a lawyer.

6. There exists a lawyer all of whose customers are doctors.

7. Every surgeon has a lawyer.

In each of the following we give an English sentence and a number of
candidate logical expressions. For each of the logical expressions,
state whether it (1) correctly expresses the English sentence; (2) is
syntactically invalid and therefore meaningless; or (3) is syntactically
valid but does not express the meaning of the English sentence.

1. Every cat loves its mother or father.

1. ${\forall\,x\;\;} {Cat}(x) {\:\;{\Rightarrow}\:\;}{Loves}(x,{Mother}(x)\lor {Father}(x))$.

2. ${\forall\,x\;\;} \lnot {Cat}(x) \lor {Loves}(x,{Mother}(x)) \lor {Loves}(x,{Father}(x))$.

3. ${\forall\,x\;\;} {Cat}(x) \land ({Loves}(x,{Mother}(x))\lor {Loves}(x,{Father}(x)))$.

2. Every dog who loves one of its brothers is happy.

1. ${\forall\,x\;\;} {Dog}(x) \land (\exists y\ {Brother}(y,x) \land {Loves}(x,y)) {\:\;{\Rightarrow}\:\;}{Happy}(x)$.

2. ${\forall\,x,y\;\;} {Dog}(x) \land {Brother}(y,x) \land {Loves}(x,y) {\:\;{\Rightarrow}\:\;}{Happy}(x)$.

3. ${\forall\,x\;\;} {Dog}(x) \land [{\forall\,y\;\;} {Brother}(y,x) {\;\;{\Leftrightarrow}\;\;}{Loves}(x,y)] {\:\;{\Rightarrow}\:\;}{Happy}(x)$.

3. No dog bites a child of its owner.

1. ${\forall\,x\;\;} {Dog}(x) {\:\;{\Rightarrow}\:\;}\lnot {Bites}(x,{Child}({Owner}(x)))$.

2. $\lnot {\exists\,x,y\;\;} {Dog}(x) \land {Child}(y,{Owner}(x)) \land {Bites}(x,y)$.

3. ${\forall\,x\;\;} {Dog}(x) {\:\;{\Rightarrow}\:\;}({\forall\,y\;\;} {Child}(y,{Owner}(x)) {\:\;{\Rightarrow}\:\;}\lnot {Bites}(x,y))$.

4. $\lnot {\exists\,x\;\;} {Dog}(x) {\:\;{\Rightarrow}\:\;}({\exists\,y\;\;} {Child}(y,{Owner}(x)) \land {Bites}(x,y))$.

4. Everyone’s zip code within a state has the same first digit.

1. ${\forall\,x,s,z_1\;\;} [{State}(s) \land {LivesIn}(x,s) \land {Zip}(x)z_1] {\:\;{\Rightarrow}\:\;}{}$\
$[{\forall\,y,z_2\;\;} {LivesIn}(y,s) \land {Zip}(y)z_2 {\:\;{\Rightarrow}\:\;}{Digit}(1,z_1) {Digit}(1,z_2) ]$.

2. ${\forall\,x,s\;\;} [{State}(s) \land {LivesIn}(x,s) \land {\exists\,z_1\;\;} {Zip}(x)z_1] {\:\;{\Rightarrow}\:\;}{}$\
$ [{\forall\,y,z_2\;\;} {LivesIn}(y,s) \land {Zip}(y)z_2 \land {Digit}(1,z_1) {Digit}(1,z_2) ]$.

3. ${\forall\,x,y,s\;\;} {State}(s) \land {LivesIn}(x,s) \land {LivesIn}(y,s) {\:\;{\Rightarrow}\:\;}{Digit}(1,{Zip}(x){Zip}(y))$.

4. ${\forall\,x,y,s\;\;} {State}(s) \land {LivesIn}(x,s) \land {LivesIn}(y,s) {\:\;{\Rightarrow}\:\;}{}$\
${Digit}(1,{Zip}(x)) {Digit}(1,{Zip}(y))$.

Complete the following exercises
about logical sentences:

1. Translate into *good, natural* English (no $x$s or $y$s!):

$$
{\forall\,x,y,l\;\;} SpeaksLanguage(x, l) \land SpeaksLanguage(y, l)
\implies Understands(x, y) \land Understands(y,x).

$$
2. Explain why this sentence is entailed by the sentence

$$
{\forall\,x,y,l\;\;} SpeaksLanguage(x, l) \land SpeaksLanguage(y, l)
\implies Understands(x, y).

$$
3. Translate into first-order logic the following sentences:

1. Understanding leads to friendship.

2. Friendship is transitive.

Remember to define all predicates, functions, and constants you use.

True or false? Explain.

1. ${\exists\,x\;\;} x{Rumpelstiltskin}$ is a valid
(necessarily true) sentence of first-order logic.

2. Every existentially quantified sentence in first-order logic is true
in any model that contains exactly one object.

3. ${\forall\,x,y\;\;} xy$is satisfiable.

Rewrite the first two Peano axioms in Section Peano-section as a single axiom that defines ${NatNum}(x)$ so as to exclude the possibility of natural numbers except for those generated by the successor function.

Equation (pit-biconditional-equation) on
page pit-biconditional-equation defines the conditions under which a square is
breezy. Here we consider two other ways to describe this aspect of the
wumpus world.

1. We can write [diagnostic rule] leading from observed effects to hidden causes. For
finding pits, the obvious diagnostic rules say that if a square is
breezy, some adjacent square must contain a pit; and if a square is
not breezy, then no adjacent square contains a pit. Write these two
rules in first-order logic and show that their conjunction is
logically equivalent to
Equation (pit-biconditional-equation).

2. We can write [causal rule] leading from cause to effect. One obvious causal rule
is that a pit causes all adjacent squares to be breezy. Write this
rule in first-order logic, explain why it is incomplete compared to
Equation (pit-biconditional-equation), and supply
the missing axiom.

Write axioms describing the predicates
${Grandchild}$, ${Greatgrandparent}$, ${Ancestor}$, ${Brother}$,
${Sister}$, ${Daughter}$, ${Son}$, ${FirstCousin}$,
${BrotherInLaw}$, ${SisterInLaw}$, ${Aunt}$, and ${Uncle}$. Find
out the proper definition of $m$th cousin $n$ times removed, and write
the definition in first-order logic. Now write down the basic facts
depicted in the family tree in Figure family1-figure.
Using a suitable logical reasoning system, it all the sentences you have
written down, and it who are Elizabeth’s grandchildren, Diana’s
brothers-in-law, Zara’s great-grandparents, and Eugenie’s ancestors.

Write down a sentence asserting that + is a commutative function. Does your sentence follow from the Peano axioms? If so, explain why; if not, give a model in which the axioms are true and your sentence is false.

Explain what is wrong with the following proposed definition of the set
membership predicate

$$ {\forall,x,s;;} x \in {x|s} $$ $$ {\forall,x,s;;} x \in s \implies {\forall,y;;} x \in {y|s} $$

Using the set axioms as examples, write axioms for the list domain, including all the constants, functions, and predicates mentioned in the chapter.

Explain what is wrong with the following proposed definition of adjacent squares in the wumpus world: $${\forall\,x,y\;\;} {Adjacent}([x,y], [x+1, y]) \land {Adjacent}([x,y], [x, y+1])\ .$$

Write out the axioms required for reasoning about the wumpus’s location, using a constant symbol ${Wumpus}$ and a binary predicate ${At}({Wumpus}, {Location})$. Remember that there is only one wumpus.

Assuming predicates ${Parent}(p,q)$ and ${Female}(p)$ and constants
${Joan}$ and ${Kevin}$, with the obvious meanings, express each of
the following sentences in first-order logic. (You may use the
abbreviation $\exists^{1}$ to mean “there exists exactly one.”)

1. Joan has a daughter (possibly more than one, and possibly sons
as well).

2. Joan has exactly one daughter (but may have sons as well).

3. Joan has exactly one child, a daughter.

4. Joan and Kevin have exactly one child together.

5. Joan has at least one child with Kevin, and no children with
anyone else.

Arithmetic assertions can be written in first-order logic with the
predicate symbol $<$, the function symbols ${+}$ and ${\times}$, and the
constant symbols 0 and 1. Additional predicates can also be defined with
biconditionals.

1. Represent the property “$x$ is an even number.”

2. Represent the property “$x$ is prime.”

3. Goldbach’s conjecture is the conjecture (unproven as yet) that every
even number is equal to the sum of two primes. Represent this
conjecture as a logical sentence.

In Chapter csp-chapter, we used equality to indicate the relation between a variable and its value. For instance, we wrote ${WA}{red}$ to mean that Western Australia is colored red. Representing this in first-order logic, we must write more verbosely ${ColorOf}({WA}){red}$. What incorrect inference could be drawn if we wrote sentences such as ${WA}{red}$ directly as logical assertions?

Write in first-order logic the assertion that every key and at least one of every pair of socks will eventually be lost forever, using only the following vocabulary: ${Key}(x)$, $x$ is a key; ${Sock}(x)$, $x$ is a sock; ${Pair}(x,y)$, $x$ and $y$ are a pair; ${Now}$, the current time; ${Before}(t_1,t_2)$, time $t_1$ comes before time $t_2$; ${Lost}(x,t)$, object $x$ is lost at time $t$.

For each of the following sentences in English, decide if the
accompanying first-order logic sentence is a good translation. If not,
explain why not and correct it. (Some sentences may have more than one
error!)

1. No two people have the same social security number.
$$\lnot {\exists\,x,y,n\;\;} {Person}(x) \land {Person}(y) {\:\;{\Rightarrow}\:\;}[{HasSS}\#(x,n) \land {HasSS}\#(y,n)].$$

2. John’s social security number is the same as Mary’s.
$${\exists\,n\;\;} {HasSS}\#({John},n) \land {HasSS}\#({Mary},n).$$

3. Everyone’s social security number has nine digits.

$${\forall\,x,n\;\;} {Person}(x) {\:\;{\Rightarrow}\:\;}[{HasSS}\#(x,n) \land {Digits}(n,9)].$$

4. Rewrite each of the above (uncorrected) sentences using a function
symbol ${SS}\#$ instead of the predicate ${HasSS}\#$.

Translate into first-order logic the sentence “Everyone’s DNA is unique and is derived from their parents’ DNA.” You must specify the precise intended meaning of your vocabulary terms. (*Hint*: Do not use the predicate ${Unique}(x)$, since uniqueness is not really a property of an object in itself!)

For each of the following sentences in English, decide if the
accompanying first-order logic sentence is a good translation. If not,
explain why not and correct it.

1. Any apartment in London has lower rent than some apartments
in Paris.

$$
\forall {x} [{Apt}(x) \land {In}(x,{London})]
\implies \exists {y} ([{Apt}(y) \land {In}(y,{Paris})] \implies ({Rent}(x) < {Rent}(y)))
$$
2. There is exactly one apartment in Paris with rent below \$1000.

$$
\exists {x} {Apt}(x) \land {In}(x,{Paris}) \land \forall{y} [{Apt}(y) \land {In}(y,{Paris}) \land ({Rent}(y) < {Dollars}(1000))] \implies (y = x)
$$
3. If an apartment is more expensive than all apartments in London, it
must be in Moscow.

$$
\forall{x} {Apt}(x) \land [\forall{y} {Apt}(y) \land {In}(y,{London}) \land ({Rent}(x) > {Rent}(y))] \implies
{In}(x,{Moscow}).
$$

Represent the following sentences in first-order logic, using a
consistent vocabulary (which you must define):

1. Some students took French in spring 2001.

2. Every student who takes French passes it.

3. Only one student took Greek in spring 2001.

4. The best score in Greek is always higher than the best score
in French.

5. Every person who buys a policy is smart.

6. No person buys an expensive policy.

7. There is an agent who sells policies only to people who are
not insured.

8. There is a barber who shaves all men in town who do not
shave themselves.

9. A person born in the UK, each of whose parents is a UK citizen or a
UK resident, is a UK citizen by birth.

10. A person born outside the UK, one of whose parents is a UK citizen
by birth, is a UK citizen by descent.

11. Politicians can fool some of the people all of the time, and they
can fool all of the people some of the time, but they can’t fool all
of the people all of the time.

12. All Greeks speak the same language. (Use ${Speaks}(x,l)$ to mean
that person $x$ speaks language $l$.)

Represent the following sentences in first-order logic, using a
consistent vocabulary (which you must define):

1. Some students took French in spring 2001.

2. Every student who takes French passes it.

3. Only one student took Greek in spring 2001.

4. The best score in Greek is always higher than the best score
in French.

5. Every person who buys a policy is smart.

6. No person buys an expensive policy.

7. There is an agent who sells policies only to people who are
not insured.

8. There is a barber who shaves all men in town who do not
shave themselves.

9. A person born in the UK, each of whose parents is a UK citizen or a
UK resident, is a UK citizen by birth.

10. A person born outside the UK, one of whose parents is a UK citizen
by birth, is a UK citizen by descent.

11. Politicians can fool some of the people all of the time, and they
can fool all of the people some of the time, but they can’t fool all
of the people all of the time.

12. All Greeks speak the same language. (Use ${Speaks}(x,l)$ to mean
that person $x$ speaks language $l$.)

Write a general set of facts and axioms to represent the assertion “Wellington heard about Napoleon’s death” and to correctly answer the question “Did Napoleon hear about Wellington’s death?”

Extend the vocabulary from Section circuits-section to define addition for $n$-bit binary numbers. Then encode the description of the four-bit adder in Figure 4bit-adder-figure, and pose the queries needed to verify that it is in fact correct.

The circuit representation in the chapter is more detailed than necessary if we care only about circuit functionality. A simpler formulation describes any $m$-input, $n$-output gate or circuit using a predicate with $m+n$ arguments, such that the predicate is true exactly when the inputs and outputs are consistent. For example, NOT gates are described by the binary predicate ${NOT}(i,o)$, for which ${NOT}(0,1)$ and ${NOT}(1,0)$ are known. Compositions of gates are defined by conjunctions of gate predicates in which shared variables indicate direct connections. For example, a NAND circuit can be composed from ${AND}$s and ${NOT}$s: $${\forall\,i_1,i_2,o_a,o\;\;} {AND}(i_1,i_2,o_a) \land {NOT}(o_a,o) {\:\;{\Rightarrow}\:\;}{NAND}(i_1,i_2,o)\ .$$ Using this representation, define the one-bit adder in Figure adder-figure and the four-bit adder in Figure adder-figure, and explain what queries you would use to verify the designs. What kinds of queries are *not* supported by this representation that *are* supported by the representation in Section circuits-section?

Obtain a passport application for your country, identify the rules determining eligibility for a passport, and translate them into first-order logic, following the steps outlined in Section circuits-section

Consider a first-order logical knowledge base that describes worlds
containing people, songs, albums (e.g., “Meet the Beatles”) and disks
(i.e., particular physical instances of CDs). The vocabulary contains
the following symbols:

> ${CopyOf}(d,a)$: Predicate. Disk $d$ is a copy of album $a$.
> ${Owns}(p,d)$: Predicate. Person $p$ owns disk $d$.
> ${Sings}(p,s,a)$: Album $a$ includes a recording of song $s$ sung by person $p$.
> ${Wrote}(p,s)$: Person $p$ wrote song $s$.
> ${McCartney}$, ${Gershwin}$, ${BHoliday}$, ${Joe}$, ${EleanorRigby}$, ${TheManILove}$, ${Revolver}$: Constants with the obvious meanings.
Express the following statements in first-order logic:

1. Gershwin wrote “The Man I Love.”

2. Gershwin did not write “Eleanor Rigby.”

3. Either Gershwin or McCartney wrote “The Man I Love.”

4. Joe has written at least one song.

5. Joe owns a copy of *Revolver*.

6. Every song that McCartney sings on *Revolver* was
written by McCartney.

7. Gershwin did not write any of the songs on *Revolver*.

8. Every song that Gershwin wrote has been recorded on some album.
(Possibly different songs are recorded on different albums.)

9. There is a single album that contains every song that Joe
has written.

10. Joe owns a copy of an album that has Billie Holiday singing “The Man
I Love.”

11. Joe owns a copy of every album that has a song sung by McCartney.
(Of course, each different album is instantiated in a different
physical CD.)

12. Joe owns a copy of every album on which all the songs are sung by
Billie Holiday.

Prove that Universal Instantiation is sound and that Existential Instantiation produces an inferentially equivalent knowledge base.

From ${Likes}({Jerry},{IceCream})$ it seems reasonable to infer ${\exists\,x\;\;} {Likes}(x,{IceCream})$. Write down a general inference rule, , that sanctions this inference. State carefully the conditions that must be satisfied by the variables and terms involved.

Suppose a knowledge base contains just one sentence,
$\exists\,x\ {AsHighAs}(x,{Everest})$. Which of the following are
legitimate results of applying Existential Instantiation?

1. ${AsHighAs}({Everest},{Everest})$.

2. ${AsHighAs}({Kilimanjaro},{Everest})$.

3. ${AsHighAs}({Kilimanjaro},{Everest}) \land {AsHighAs}({BenNevis},{Everest})$\
(after two applications).

For each pair of atomic sentences, give the most general unifier if it
exists:

1. $P(A,B,B)$, $P(x,y,z)$.

2. $Q(y,G(A,B))$, $Q(G(x,x),y)$.

3. ${Older}({Father}(y),y)$, ${Older}({Father}(x),{John})$.

4. ${Knows}({Father}(y),y)$, ${Knows}(x,x)$.

For each pair of atomic sentences, give the most general unifier if it
exists:

1. $P(A,B,B)$, $P(x,y,z)$.

2. $Q(y,G(A,B))$, $Q(G(x,x),y)$.

3. ${Older}({Father}(y),y)$, ${Older}({Father}(x),{John})$.

4. ${Knows}({Father}(y),y)$, ${Knows}(x,x)$.

Consider the subsumption lattices shown
in Figure subsumption-lattice-figure
(page subsumption-lattice-figure

.
1. Construct the lattice for the sentence
${Employs}({Mother}({John}),{Father}({Richard}))$.

2. Construct the lattice for the sentence ${Employs}({IBM},y)$
(“Everyone works for IBM”). Remember to include every kind of query
that unifies with the sentence.

3. Assume that indexes each sentence under every node in its
subsumption lattice. Explain how should work when some of these
sentences contain variables; use as examples the sentences in (a)
and (b) and the query ${Employs}(x,{Father}(x))$.

Write down logical representations for the
following sentences, suitable for use with Generalized Modus Ponens:

1. Horses, cows, and pigs are mammals.

2. An offspring of a horse is a horse.

3. Bluebeard is a horse.

4. Bluebeard is Charlie’s parent.

5. Offspring and parent are inverse relations.

6. Every mammal has a parent.

These questions concern concern issues with substitution and
Skolemization.

1. Given the premise ${\forall\,x\;\;} {\exists\,y\;\;} P(x,y)$, it is
not valid to conclude that ${\exists\,q\;\;} P(q,q)$. Give an
example of a predicate $P$ where the first is true but the second
is false.

2. Suppose that an inference engine is incorrectly written with the
occurs check omitted, so that it allows a literal like $P(x,F(x))$
to be unified with $P(q,q)$. (As mentioned, most standard
implementations of Prolog actually do allow this.) Show that such an
inference engine will allow the conclusion ${\exists\,y\;\;} P(q,q)$
to be inferred from the premise
${\forall\,x\;\;} {\exists\,y\;\;} P(x,y)$.

3. Suppose that a procedure that converts first-order logic to clausal
form incorrectly Skolemizes
${\forall\,x\;\;} {\exists\,y\;\;} P(x,y)$ to $P(x,Sk0)$—that is, it
replaces $y$ by a Skolem constant rather than by a Skolem function
of $x$. Show that an inference engine that uses such a procedure
will likewise allow ${\exists\,q\;\;} P(q,q)$ to be inferred from
the premise ${\forall\,x\;\;} {\exists\,y\;\;} P(x,y)$.

4. A common error among students is to suppose that, in unification,
one is allowed to substitute a term for a Skolem constant instead of
for a variable. For instance, they will say that the formulas
$P(Sk1)$ and $P(A)$ can be unified under the substitution
$\{ Sk1/A \}$. Give an example where this leads to an
invalid inference.

This question considers Horn KBs, such as the following:
$$\begin{array}{l}
P(F(x)) {\:\;{\Rightarrow}\:\;}P(x)\\
Q(x) {\:\;{\Rightarrow}\:\;}P(F(x))\\
P(A)\\
Q(B)
\end{array}$$ Let FC be a breadth-first forward-chaining algorithm that
repeatedly adds all consequences of currently satisfied rules; let BC be
a depth-first left-to-right backward-chaining algorithm that tries
clauses in the order given in the KB. Which of the following are true?

1. FC will infer the literal $Q(A)$.

2. FC will infer the literal $P(B)$.

3. If FC has failed to infer a given literal, then it is not entailed
by the KB.

4. BC will return ${true}$ given the query $P(B)$.

5. If BC does not return ${true}$ given a query literal, then it is
not entailed by the KB.

Explain how to write any given 3-SAT problem of arbitrary size using a single first-order definite clause and no more than 30 ground facts.

Suppose you are given the following axioms:

1. $0 \leq 3$.

2. $7 \leq 9$.

3. ${\forall\,x\;\;} \; \; x \leq x$.

4. ${\forall\,x\;\;} \; \; x \leq x+0$.

5. ${\forall\,x\;\;} \; \; x+0 \leq x$.

6. ${\forall\,x,y\;\;} \; \; x+y \leq y+x$.

7. ${\forall\,w,x,y,z\;\;} \; \; w \leq y$ $\wedge$ $x \leq z$ ${\:\;{\Rightarrow}\:\;}$ $w+x \leq y+z$.

8. ${\forall\,x,y,z\;\;} \; \; x \leq y \wedge y \leq z \: {\:\;{\Rightarrow}\:\;}\: x \leq z$

1. Give a backward-chaining proof of the sentence $7 \leq 3+9$. (Be
sure, of course, to use only the axioms given here, not anything
else you may know about arithmetic.) Show only the steps that leads
to success, not the irrelevant steps.

2. Give a forward-chaining proof of the sentence $7 \leq 3+9$. Again,
show only the steps that lead to success.

Suppose you are given the following axioms:

> 1. $0 \leq 4$.

> 2. $5 \leq 9$.

> 3. ${\forall\,x\;\;} \; \; x \leq x$.

> 4. ${\forall\,x\;\;} \; \; x \leq x+0$.

> 5. ${\forall\,x\;\;} \; \; x+0 \leq x$.

> 6. ${\forall\,x,y\;\;} \; \; x+y \leq y+x$.

> 7. ${\forall\,w,x,y,z\;\;} \; \; w \leq y$ $\wedge$ $x \leq z {\:\;{\Rightarrow}\:\;}$ $w+x \leq y+z$.

> 8. ${\forall\,x,y,z\;\;} \; \; x \leq y \wedge y \leq z \: {\:\;{\Rightarrow}\:\;}\: x \leq z$

1. Give a backward-chaining proof of the sentence $5 \leq 4+9$. (Be
sure, of course, to use only the axioms given here, not anything
else you may know about arithmetic.) Show only the steps that leads
to success, not the irrelevant steps.

2. Give a forward-chaining proof of the sentence $5 \leq 4+9$. Again,
show only the steps that lead to success.

A popular children’s riddle is “Brothers and sisters have I none, but that man’s father is my father’s son.” Use the rules of the family domain (Section kinship-domain-section on page kinship-domain-section to show who that man is. You may apply any of the inference methods described in this chapter. Why do you think that this riddle is difficult?

Suppose we put into a logical knowledge base a segment of the
U.S. census data listing the age, city of residence, date of birth, and
mother of every person, using social security numbers as identifying
constants for each person. Thus, George’s age is given by
${Age}(443-65-1282, 56)$. Which of the following
indexing schemes S1–S5 enable an efficient solution for which of the
queries Q1–Q4 (assuming normal backward chaining)?

- **S1**: an index for each atom in each position.

- **S2**: an index for each first argument.

- **S3**: an index for each predicate atom.

- **S4**: an index for each *combination* of predicate and first argument.

- **S5**: an index for each *combination* of predicate and second argument and an index for each first argument.

- **Q1**: ${Age}(\mbox 443-44-4321,x)$

- **Q2**: ${ResidesIn}(x,{Houston})$

- **Q3**: ${Mother}(x,y)$

- **Q4**: ${Age}(x,{34}) \land {ResidesIn}(x,{TinyTownUSA})$

One might suppose that we can avoid the
problem of variable conflict in unification during backward chaining by
standardizing apart all of the sentences in the knowledge base once and
for all. Show that, for some sentences, this approach cannot work.
(*Hint*: Consider a sentence in which one part unifies with
another.)

In this exercise, use the sentences you wrote in
Exercise fol-horses-exercise to answer a question by
using a backward-chaining algorithm.

1. Draw the proof tree generated by an exhaustive backward-chaining
algorithm for the query ${\exists\,h\;\;}{Horse}(h)$, where
clauses are matched in the order given.

2. What do you notice about this domain?

3. How many solutions for $h$ actually follow from your sentences?

4. Can you think of a way to find all of them? (*Hint*:
See Smith+al:1986.)

Trace the execution of the backward-chaining algorithm in Figure backward-chaining-algorithm (page backward-chaining-algorithm when it is applied to solve the crime problem (page west-problem-page. Show the sequence of values taken on by the ${goals}$ variable, and arrange them into a tree.

The following Prolog code defines a predicate P. (Remember
that uppercase terms are variables, not constants, in Prolog.)

P(X,[X|Y]).

P(X,[Y|Z]) :- P(X,Z).

1. Show proof trees and solutions for the queries
P(A,[2,1,3]) and P(2,[1,A,3]).

2. What standard list operation does P represent?

The following Prolog code defines a predicate P. (Remember
that uppercase terms are variables, not constants, in Prolog.)

P(X,[X|Y]).

P(X,[Y|Z]) :- P(X,Z).

1. Show proof trees and solutions for the queries
P(A,[1,2,3]) and P(2,[1,A,3]).

2. What standard list operation does P represent?

This exercise looks at sorting in Prolog.

1. Write Prolog clauses that define the predicate
sorted(L), which is true if and only if list
L is sorted in ascending order.

2. Write a Prolog definition for the predicate perm(L,M),
which is true if and only if L is a permutation of
M.

3. Define sort(L,M) (M is a sorted version of
L) using perm and sorted.

4. Run sort on longer and longer lists until you lose
patience. What is the time complexity of your program?

5. Write a faster sorting algorithm, such as insertion sort or
quicksort, in Prolog.

This exercise looks at the recursive
application of rewrite rules, using logic programming. A rewrite rule
(or **demodulator** in terminology) is an
equation with a specified direction. For example, the rewrite rule
$x+0 \rightarrow x$ suggests replacing any expression that matches $x+0$
with the expression $x$. Rewrite rules are a key component of equational
reasoning systems. Use the predicate rewrite(X,Y) to
represent rewrite rules. For example, the earlier rewrite rule is
written as rewrite(X+0,X). Some terms are
*primitive* and cannot be further simplified; thus, we
write primitive(0) to say that 0 is a primitive term.

1. Write a definition of a predicate simplify(X,Y), that
is true when Y is a simplified version of
X—that is, when no further rewrite rules apply to any
subexpression of Y.

2. Write a collection of rules for the simplification of expressions
involving arithmetic operators, and apply your simplification
algorithm to some sample expressions.

3. Write a collection of rewrite rules for symbolic differentiation,
and use them along with your simplification rules to differentiate
and simplify expressions involving arithmetic expressions,
including exponentiation.

This exercise considers the implementation of search algorithms in Prolog. Suppose that successor(X,Y) is true when state Y is a successor of state X; and that goal(X) is true when X is a goal state. Write a definition for solve(X,P), which means that P is a path (list of states) beginning with X, ending in a goal state, and consisting of a sequence of legal steps as defined by successor. You will find that depth-first search is the easiest way to do this. How easy would it be to add heuristic search control?

Suppose a knowledge base contains just the following first-order Horn
clauses:

$$
Ancestor(Mother(x),x)
$$
$$
Ancestor(x,y) \land Ancestor(y,z) \implies Ancestor(x,z)
$$
Consider a forward chaining algorithm that, on the $j$th iteration,
terminates if the KB contains a sentence that unifies with the query,
else adds to the KB every atomic sentence that can be inferred from the
sentences already in the KB after iteration $j-1$.

1. For each of the following queries, say whether the algorithm
will (1) give an answer (if so, write down that answer); or (2)
terminate with no answer; or (3) never terminate.

1. $Ancestor(Mother(y),John)$

2. $Ancestor(Mother(Mother(y)),John)$

3. $Ancestor(Mother(Mother(Mother(y))),Mother(y))$

4. $Ancestor(Mother(John),Mother(Mother(John)))$

2. Can a resolution algorithm prove the sentence
$\lnot Ancestor(John,John)$ from the original knowledge base?
Explain how, or why not.

3. Suppose we add the assertion that $\lnot(Mother(x)x)$ and
augment the resolution algorithm with inference rules for equality.
Now what is the answer to (b)?

Let $\cal L$ be the first-order language with a single predicate
$S(p,q)$, meaning “$p$ shaves $q$.” Assume a domain of people.

1. Consider the sentence “There exists a person $P$ who shaves every
one who does not shave themselves, and only people that do not
shave themselves.” Express this in $\cal L$.

2. Convert the sentence in (a) to clausal form.

3. Construct a resolution proof to show that the clauses in (b) are
inherently inconsistent. (Note: you do not need any
additional axioms.)

How can resolution be used to show that a sentence is valid? Unsatisfiable?

Construct an example of two clauses that can be resolved together in two different ways giving two different outcomes.

From “Horses are animals,” it follows that “The head of a horse is the
head of an animal.” Demonstrate that this inference is valid by carrying
out the following steps:

1. Translate the premise and the conclusion into the language of
first-order logic. Use three predicates: ${HeadOf}(h,x)$ (meaning
“$h$ is the head of $x$”), ${Horse}(x)$, and ${Animal}(x)$.

2. Negate the conclusion, and convert the premise and the negated
conclusion into conjunctive normal form.

3. Use resolution to show that the conclusion follows from the premise.

From “Sheep are animals,” it follows that “The head of a sheep is the
head of an animal.” Demonstrate that this inference is valid by carrying
out the following steps:

1. Translate the premise and the conclusion into the language of
first-order logic. Use three predicates: ${HeadOf}(h,x)$ (meaning
“$h$ is the head of $x$”), ${Sheep}(x)$, and ${Animal}(x)$.

2. Negate the conclusion, and convert the premise and the negated
conclusion into conjunctive normal form.

3. Use resolution to show that the conclusion follows from the premise.

Here are two sentences in the language of
first-order logic:

- **(A)**
${\forall\,x\;\;} {\exists\,y\;\;} ( x \geq y )$
- **(B)**
${\exists\,y\;\;} {\forall\,x\;\;} ( x \geq y )$
1. Assume that the variables range over all the natural numbers
$0,1,2,\ldots, \infty$ and that the “$\geq$” predicate means “is
greater than or equal to.” Under this interpretation, translate (A)
and (B) into English.

2. Is (A) true under this interpretation?

3. Is (B) true under this interpretation?

4. Does (A) logically entail (B)?

5. Does (B) logically entail (A)?

6. Using resolution, try to prove that (A) follows from (B). Do this
even if you think that (B) does not logically entail (A); continue
until the proof breaks down and you cannot proceed (if it does
break down). Show the unifying substitution for each resolution
step. If the proof fails, explain exactly where, how, and why it
breaks down.

7. Now try to prove that (B) follows from (A).

Resolution can produce nonconstructive proofs for queries with variables, so we had to introduce special mechanisms to extract definite answers. Explain why this issue does not arise with knowledge bases containing only definite clauses.

We said in this chapter that resolution cannot be used to generate all logical consequences of a set of sentences. Can any algorithm do this?

Consider a robot whose operation is described by the following PDDL
operators:

$$
Op({Go(x,y)},{At(Robot,x)},{\lnot At(Robot,x) \land At(Robot,y)})
$$
$$
Op({Pick(o)},{At(Robot,x)\land At(o,x)},{\lnot At(o,x) \land Holding(o)})
$$
$$
Op({Drop(o)},{At(Robot,x)\land Holding(o)},{At(o,x) \land \lnot Holding(o)}
$$
1. The operators allow the robot to hold more than one object. Show how
to modify them with an $EmptyHand$ predicate for a robot that can
hold only one object.

2. Assuming that these are the only actions in the world, write a
successor-state axiom for $EmptyHand$.

Describe the differences and similarities between problem solving and planning.

Given the action schemas and initial state
from Figure airport-pddl-algorithm, what are all the
applicable concrete instances of ${Fly}(p,{from},{to})$ in the
state described by

$$
At(P_1,JFK) \land At(P_2,SFO) \land Plane(P_1) \land Plane(P_2) \land Airport(JFK) \land Airport(SFO)?
$$

The monkey-and-bananas problem is faced by a monkey in a laboratory with
some bananas hanging out of reach from the ceiling. A box is available
that will enable the monkey to reach the bananas if he climbs on it.
Initially, the monkey is at $A$, the bananas at $B$, and the box at $C$.
The monkey and box have height ${Low}$, but if the monkey climbs onto
the box he will have height ${High}$, the same as the bananas. The
actions available to the monkey include ${Go}$ from one place to
another, ${Push}$ an object from one place to another, ${ClimbUp}$
onto or ${ClimbDown}$ from an object, and ${Grasp}$ or ${Ungrasp}$
an object. The result of a ${Grasp}$ is that the monkey holds the
object if the monkey and object are in the same place at the same
height.

1. Write down the initial state description.

2. Write the six action schemas.

3. Suppose the monkey wants to fool the scientists, who are off to tea,
by grabbing the bananas, but leaving the box in its original place.
Write this as a general goal (i.e., not assuming that the box is
necessarily at C) in the language of situation calculus. Can this
goal be solved by a classical planning system?

4. Your schema for pushing is probably incorrect, because if the object
is too heavy, its position will remain the same when the ${Push}$
schema is applied. Fix your action schema to account for
heavy objects.

The original {Strips} planner was designed to control Shakey the robot.
Figure shakey-figure shows a version of Shakey’s world
consisting of four rooms lined up along a corridor, where each room has
a door and a light switch. The actions in Shakey’s world include moving from place to place,
pushing movable objects (such as boxes), climbing onto and down from
rigid objects (such as boxes), and turning light switches on and off.
The robot itself could not climb on a box or toggle a switch, but the
planner was capable of finding and printing out plans that were beyond
the robot’s abilities. Shakey’s six actions are the following:

- ${Go}(x,y,r)$, which requires that Shakey be ${At}$ $x$ and that
$x$ and $y$ are locations ${In}$ the same room $r$. By convention
a door between two rooms is in both of them.

- Push a box $b$ from location $x$ to location $y$ within the same
room: ${Push}(b,x,y,r)$. You will need the predicate ${Box}$ and
constants for the boxes.

- Climb onto a box from position $x$: ${ClimbUp}(x, b)$; climb down
from a box to position $x$: ${ClimbDown}(b, x)$. We will need the
predicate ${On}$ and the constant ${Floor}$.

- Turn a light switch on or off: ${TurnOn}(s,b)$;
${TurnOff}(s,b)$. To turn a light on or off, Shakey must be on top
of a box at the light switch’s location.

Write PDDL sentences for Shakey’s six actions and the initial state from
Construct a plan for Shakey to
get ${Box}{}_2$ into ${Room}{}_2$.

A finite Turing machine has a finite one-dimensional tape of cells, each
cell containing one of a finite number of symbols. One cell has a read
and write head above it. There is a finite set of states the machine can
be in, one of which is the accept state. At each time step, depending on
the symbol on the cell under the head and the machine’s current state,
there are a set of actions we can choose from. Each action involves
writing a symbol to the cell under the head, transitioning the machine
to a state, and optionally moving the head left or right. The mapping
that determines which actions are allowed is the Turing machine’s
program. Your goal is to control the machine into the accept state.

Represent the Turing machine acceptance problem as a planning problem.
If you can do this, it demonstrates that determining whether a planning
problem has a solution is at least as hard as the Turing acceptance
problem, which is PSPACE-hard.

Explain why dropping negative effects from every action schema results in a relaxed problem, provided that preconditions and goals contain only positive literals.

Figure sussman-anomaly-figure (page sussman-anomaly-figure) shows a blocks-world problem that is known as the {Sussman anomaly}. The problem was considered anomalous because the noninterleaved planners of the early 1970s could not solve it. Write a definition of the problem and solve it, either by hand or with a planning program. A noninterleaved planner is a planner that, when given two subgoals $G_{1}$ and $G_{2}$, produces either a plan for $G_{1}$ concatenated with a plan for $G_{2}$, or vice versa. Can a noninterleaved planner solve this problem? How, or why not?

Prove that backward search with PDDL problems is complete.

Construct levels 0, 1, and 2 of the planning graph for the problem in Figure airport-pddl-algorithm

Prove the following assertions about
planning graphs:

1. A literal that does not appear in the final level of the graph
cannot be achieved.

2. The level cost of a literal in a serial graph is no greater than the
actual cost of an optimal plan for achieving it.

We saw that planning graphs can handle only propositional actions. What if we want to use planning graphs for a problem with variables in the goal, such as ${At}(P_{1}, x) \land {At}(P_{2}, x)$, where $x$ is assumed to be bound by an existential quantifier that ranges over a finite domain of locations? How could you encode such a problem to work with planning graphs?

The set-level heuristic (see page set-level-page uses a planning graph to estimate the cost of achieving a conjunctive goal from the current state. What relaxed problem is the set-level heuristic the solution to?

Examine the definition of **bidirectional search** in Chapter search-chapter.

1. Would bidirectional state-space search be a good idea for planning?

2. What about bidirectional search in the space of partial-order plans?

3. Devise a version of partial-order planning in which an action can be
added to a plan if its preconditions can be achieved by the effects
of actions already in the plan. Explain how to deal with conflicts
and ordering constraints. Is the algorithm essentially identical to
forward state-space search?

We contrasted forward and backward state-space searchers with partial-order planners, saying that the latter is a plan-space searcher. Explain how forward and backward state-space search can also be considered plan-space searchers, and say what the plan refinement operators are.

Up to now we have assumed that the
plans we create always make sure that an action’s preconditions are
satisfied. Let us now investigate what propositional successor-state
axioms such as ${HaveArrow}^{t+1} {\;\;{\Leftrightarrow}\;\;}{}$
$({HaveArrow}^t
\land \lnot {Shoot}^t)$ have to say about actions whose preconditions
are not satisfied.

1. Show that the axioms predict that nothing will happen when an action
is executed in a state where its preconditions are not satisfied.

2. Consider a plan $p$ that contains the actions required to achieve a
goal but also includes illegal actions. Is it the case that
$$
initial state \land successor-state axioms \land
p {\models} goal ?
$$
3. With first-order successor-state axioms in situation calculus, is it
possible to prove that a plan containing illegal actions will
achieve the goal?

Consider how to translate a set of action
schemas into the successor-state axioms of situation calculus.

1. Consider the schema for ${Fly}(p,{from},{to})$. Write a
logical definition for the predicate
${Poss}({Fly}(p,{from},{to}),s)$, which is true if the
preconditions for ${Fly}(p,{from},{to})$ are satisfied in
situation $s$.

2. Next, assuming that ${Fly}(p,{from},{to})$ is the only action
schema available to the agent, write down a successor-state axiom
for ${At}(p,x,s)$ that captures the same information as the
action schema.

3. Now suppose there is an additional method of travel:
${Teleport}(p,{from},{to})$. It has the additional
precondition $\lnot {Warped}(p)$ and the additional effect
${Warped}(p)$. Explain how the situation calculus knowledge base
must be modified.

4. Finally, develop a general and precisely specified procedure for
carrying out the translation from a set of action schemas to a set
of successor-state axioms.

In the $SATPlan$ algorithm in
Figure satplan-agent-algorithm (page satplan-agent-algorithm,
each call to the satisfiability algorithm asserts a goal $g^T$, where
$T$ ranges from 0 to $T_{max}$. Suppose instead that the
satisfiability algorithm is called only once, with the goal
$g^0 \vee g^1 \vee \cdots \vee g^{T_{max}}$.

1. Will this always return a plan if one exists with length less than
or equal to $T_{max}$?

2. Does this approach introduce any new spurious “solutions”?

3. Discuss how one might modify a satisfiability algorithm such as $WalkSAT$ so
that it finds short solutions (if they exist) when given a
disjunctive goal of this form.

The goals we have considered so far all ask the planner to make the
world satisfy the goal at just one time step. Not all goals can be
expressed this way: you do not achieve the goal of suspending a
chandelier above the ground by throwing it in the air. More seriously,
you wouldn’t want your spacecraft life-support system to supply oxygen
one day but not the next. A *maintenance goal* is achieved
when the agent’s plan causes a condition to hold continuously from a
given state onward. Describe how to extend the formalism of this chapter
to support maintenance goals.

You have a number of trucks with which to deliver a set of packages. Each package starts at some location on a grid map, and has a destination somewhere else. Each truck is directly controlled by moving forward and turning. Construct a hierarchy of high-level actions for this problem. What knowledge about the solution does your hierarchy encode?

Suppose that a high-level action has exactly one implementation as a sequence of primitive actions. Give an algorithm for computing its preconditions and effects, given the complete refinement hierarchy and schemas for the primitive actions.

Suppose that the optimistic reachable set of a high-level plan is a superset of the goal set; can anything be concluded about whether the plan achieves the goal? What if the pessimistic reachable set doesn’t intersect the goal set? Explain.

Write an algorithm that takes an initial state (specified by a set of propositional literals) and a sequence of HLAs (each defined by preconditions and angelic specifications of optimistic and pessimistic reachable sets) and computes optimistic and pessimistic descriptions of the reachable set of the sequence.

In Figure jobshop-cpm-figure we showed how to describe actions in a scheduling problem by using separate fields for , , and . Now suppose we wanted to combine scheduling with nondeterministic planning, which requires nondeterministic and conditional effects. Consider each of the three fields and explain if they should remain separate fields, or if they should become effects of the action. Give an example for each of the three.

Some of the operations in standard programming languages can be modeled
as actions that change the state of the world. For example, the
assignment operation changes the contents of a memory location, and the
print operation changes the state of the output stream. A program
consisting of these operations can also be considered as a plan, whose
goal is given by the specification of the program. Therefore, planning
algorithms can be used to construct programs that achieve a given
specification.

1. Write an action schema for the assignment operator (assigning the
value of one variable to another). Remember that the original value
will be overwritten!

2. Show how object creation can be used by a planner to produce a plan
for exchanging the values of two variables by using a
temporary variable.

Consider the following argument: In a framework that allows uncertain
initial states, **nondeterministic effects**
are just a notational convenience, not a source of additional
representational power. For any action schema $a$ with nondeterministic
effect $P \lor Q$, we could always replace it with the conditional
effects ${~R{:}~P} \land
{~\lnot R{:}~Q}$, which in turn can be
reduced to two regular actions. The proposition $R$ stands for a random
proposition that is unknown in the initial state and for which there are
no sensing actions. Is this argument correct? Consider separately two
cases, one in which only one instance of action schema $a$ is in the
plan, the other in which more than one instance is.

Suppose the ${Flip}$ action always changes the truth value of variable $L$. Show how to define its effects by using an action schema with conditional effects. Show that, despite the use of conditional effects, a 1-CNF belief state representation remains in 1-CNF after a ${Flip}$.

In the blocks world we were forced to introduce two action schemas, ${Move}$ and ${MoveToTable}$, in order to maintain the ${Clear}$ predicate properly. Show how conditional effects can be used to represent both of these cases with a single action.

Conditional effects were illustrated for the
${Suck}$ action in the vacuum world—which square becomes clean depends
on which square the robot is in. Can you think of a new set of
propositional variables to define states of the vacuum world, such that
${Suck}$ has an *unconditional* description? Write out
the descriptions of ${Suck}$, ${Left}$, and ${Right}$, using your
propositions, and demonstrate that they suffice to describe all possible
states of the world.

Find a suitably dirty carpet, free of obstacles, and vacuum it. Draw the path taken by the vacuum cleaner as accurately as you can. Explain it, with reference to the forms of planning discussed in this chapter.

The following quotes are from the backs of shampoo bottles. Identify each as an unconditional, conditional, or execution-monitoring plan. (a) “Lather. Rinse. Repeat.” (b) “Apply shampoo to scalp and let it remain for several minutes. Rinse and repeat if necessary.” (c) “See a doctor if problems persist.”

Consider the following problem: A patient arrives at the doctor’s office with symptoms that could have been caused either by dehydration or by disease $D$ (but not both). There are two possible actions: ${Drink}$, which unconditionally cures dehydration, and ${Medicate}$, which cures disease $D$ but has an undesirable side effect if taken when the patient is dehydrated. Write the problem description, and diagram a sensorless plan that solves the problem, enumerating all relevant possible worlds.

To the medication problem in the previous exercise, add a ${Test}$ action that has the conditional effect ${CultureGrowth}$ when ${Disease}$ is true and in any case has the perceptual effect ${Known}({CultureGrowth})$. Diagram a conditional plan that solves the problem and minimizes the use of the ${Medicate}$ action.

Define an ontology in first-order logic for tic-tac-toe. The ontology should contain situations, actions, squares, players, marks (X, O, or blank), and the notion of winning, losing, or drawing a game. Also define the notion of a forced win (or draw): a position from which a player can force a win (or draw) with the right sequence of actions. Write axioms for the domain. (Note: The axioms that enumerate the different squares and that characterize the winning positions are rather long. You need not write these out in full, but indicate clearly what they look like.)

You are to create a system for advising computer science undergraduates
on what courses to take over an extended period in order to satisfy the
program requirements. (Use whatever requirements are appropriate for
your institution.) First, decide on a vocabulary for representing all
the information, and then represent it; then formulate a query to the
system that will return a legal program of study as a solution. You
should allow for some tailoring to individual students, in that your
system should ask what courses or equivalents the student has already
taken, and not generate programs that repeat those courses.
Suggest ways in which your system could be improved—for example to take
into account knowledge about student preferences, the workload, good and
bad instructors, and so on. For each kind of knowledge, explain how it
could be expressed logically. Could your system easily incorporate this
information to find all feasible programs of study for a student? Could
it find the *best* program?

Figure ontology-figure shows the top levels of a
hierarchy for everything. Extend it to include as many real categories
as possible. A good way to do this is to cover all the things in your
everyday life. This includes objects and events. Start with waking up,
and proceed in an orderly fashion noting everything that you see, touch,
do, and think about. For example, a random sampling produces music,
news, milk, walking, driving, gas, Soda Hall, carpet, talking, Professor
Fateman, chicken curry, tongue, \$ 7, sun, the daily newspaper, and so on.

You should produce both a single hierarchy chart (on a large sheet of
paper) and a listing of objects and categories with the relations
satisfied by members of each category. Every object should be in a
category, and every category should be in the hierarchy.

Develop a representational system for reasoning
about windows in a window-based computer interface. In particular, your
representation should be able to describe:

- The state of a window: minimized, displayed, or nonexistent.

- Which window (if any) is the active window.

- The position of every window at a given time.

- The order (front to back) of overlapping windows.

- The actions of creating, destroying, resizing, and moving windows;
changing the state of a window; and bringing a window to the front.
Treat these actions as atomic; that is, do not deal with the issue
of relating them to mouse actions. Give axioms describing the
effects of actions on fluents. You may use either event or
situation calculus.

Assume an ontology containing *situations,*
*actions,* *integers* (for $x$ and $y$
coordinates) and *windows*. Define a language over this
ontology; that is, a list of constants, function symbols, and predicates
with an English description of each. If you need to add more categories
to the ontology (e.g., pixels), you may do so, but be sure to specify
these in your write-up. You may (and should) use symbols defined in the
text, but be sure to list these explicitly.

State the following in the language you developed for the previous
exercise:

1. In situation $S_0$, window $W_1$ is behind $W_2$ but sticks out on
the top and bottom. Do *not* state exact coordinates
for these; describe the *general* situation.

2. If a window is displayed, then its top edge is higher than its
bottom edge.

3. After you create a window $w$, it is displayed.

4. A window can be minimized only if it is displayed.

State the following in the language you developed for the previous
exercise:

1. In situation $S_0$, window $W_1$ is behind $W_2$ but sticks out on
the top and bottom. Do *not* state exact coordinates
for these; describe the *general* situation.

2. If a window is displayed, then its top edge is higher than its
bottom edge.

3. After you create a window $w$, it is displayed.

4. A window can be minimized only if it is displayed.

(Adapted from an example by Doug Lenat.) Your mission is to capture, in
logical form, enough knowledge to answer a series of questions about the
following simple scenario:

* Yesterday John went to the North Berkeley Safeway supermarket and*

* bought two pounds of tomatoes and a pound of ground beef.*
Start by trying to represent the content of the sentence as a series of
assertions. You should write sentences that have straightforward logical
structure (e.g., statements that objects have certain properties, that
objects are related in certain ways, that all objects satisfying one
property satisfy another). The following might help you get started:

- Which classes, objects, and relations would you need? What are their
parents, siblings and so on? (You will need events and temporal
ordering, among other things.)

- Where would they fit in a more general hierarchy?

- What are the constraints and interrelationships among them?

- How detailed must you be about each of the various concepts?

To answer the questions below, your knowledge base must include
background knowledge. You’ll have to deal with what kind of things are
at a supermarket, what is involved with purchasing the things one
selects, what the purchases will be used for, and so on. Try to make
your representation as general as possible. To give a trivial example:
don’t say “People buy food from Safeway,” because that won’t help you
with those who shop at another supermarket. Also, don’t turn the
questions into answers; for example, question (c) asks “Did John buy any
meat?”—not “Did John buy a pound of ground beef?”

Sketch the chains of reasoning that would answer the questions. If
possible, use a logical reasoning system to demonstrate the sufficiency
of your knowledge base. Many of the things you write might be only
approximately correct in reality, but don’t worry too much; the idea is
to extract the common sense that lets you answer these questions at all.
A truly complete answer to this question is *extremely*
difficult, probably beyond the state of the art of current knowledge
representation. But you should be able to put together a consistent set
of axioms for the limited questions posed here.

1. Is John a child or an adult? [Adult]

2. Does John now have at least two tomatoes? [Yes]

3. Did John buy any meat? [Yes]

4. If Mary was buying tomatoes at the same time as John, did he see
her? [Yes]

5. Are the tomatoes made in the supermarket? [No]

6. What is John going to do with the tomatoes? [Eat them]

7. Does Safeway sell deodorant? [Yes]

8. Did John bring some money or a credit card to the supermarket?
[Yes]

9. Does John have less money after going to the supermarket? [Yes]

Make the necessary additions or changes to your knowledge base from the
previous exercise so that the questions that follow can be answered.
Include in your report a discussion of your changes, explaining why they
were needed, whether they were minor or major, and what kinds of
questions would necessitate further changes.

1. Are there other people in Safeway while John is there?
[Yes—staff!]

2. Is John a vegetarian? [No]

3. Who owns the deodorant in Safeway? [Safeway Corporation]

4. Did John have an ounce of ground beef? [Yes]

5. Does the Shell station next door have any gas? [Yes]

6. Do the tomatoes fit in John’s car trunk? [Yes]

Represent the following seven sentences using and extending the
representations developed in the chapter:

1. Water is a liquid between 0 and 100 degrees.

2. Water boils at 100 degrees.

3. The water in John’s water bottle is frozen.

4. Perrier is a kind of water.

5. John has Perrier in his water bottle.

6. All liquids have a freezing point.

7. A liter of water weighs more than a liter of alcohol.

Write definitions for the following:

1. ${ExhaustivePartDecomposition}$

2. ${PartPartition}$

3. ${PartwiseDisjoint}$

These should be analogous to the definitions for
${ExhaustiveDecomposition}$, ${Partition}$, and ${Disjoint}$. Is
it the case that ${PartPartition}(s,{BunchOf}(s))$? If so, prove it;
if not, give a counterexample and define sufficient conditions under
which it does hold.

An alternative scheme for representing measures involves applying the units function to an abstract length object. In such a scheme, one would write ${Inches}({Length}(L_1)) = {1.5}$. How does this scheme compare with the one in the chapter? Issues include conversion axioms, names for abstract quantities (such as “50 dollars”), and comparisons of abstract measures in different units (50 inches is more than 50 centimeters).

Write a set of sentences that allows one to calculate the price of an individual tomato (or other object), given the price per pound. Extend the theory to allow the price of a bag of tomatoes to be calculated.

Add sentences to extend the definition of the predicate ${Name}(s, c)$ so that a string such as “laptop computer” matches the appropriate category names from a variety of stores. Try to make your definition general. Test it by looking at ten online stores, and at the category names they give for three different categories. For example, for the category of laptops, we found the names “Notebooks,” “Laptops,” “Notebook Computers,” “Notebook,” “Laptops and Notebooks,” and “Notebook PCs.” Some of these can be covered by explicit ${Name}$ facts, while others could be covered by sentences for handling plurals, conjunctions, etc.

Write event calculus axioms to describe the actions in the wumpus world.

State the interval-algebra relation that holds between every pair of the
following real-world events:

> $LK$: The life of President Kennedy.

> $IK$: The infancy of President Kennedy.

> $PK$: The presidency of President Kennedy.

> $LJ$: The life of President Johnson.

> $PJ$: The presidency of President Johnson.

> $LO$: The life of President Obama.

This exercise concerns the problem of planning a route for a robot to
take from one city to another. The basic action taken by the robot is
${Go}(x,y)$, which takes it from city $x$ to city $y$ if there is a
route between those cities. ${Road}(x, y)$ is true if and only if
there is a road connecting cities $x$ and $y$; if there is, then
${Distance}(x, y)$ gives the length of the road. See the map on
page romania-distances-figure for an example. The robot begins in Arad and must
reach Bucharest.

1. Write a suitable logical description of the initial situation of
the robot.

2. Write a suitable logical query whose solutions provide possible
paths to the goal.

3. Write a sentence describing the ${Go}$ action.

4. Now suppose that the robot consumes fuel at the rate of .02 gallons
per mile. The robot starts with 20 gallons of fuel. Augment your
representation to include these considerations.

5. Now suppose some of the cities have gas stations at which the robot
can fill its tank. Extend your representation and write all the
rules needed to describe gas stations, including the
${Fillup}$ action.

Investigate ways to extend the event calculus to handle
*simultaneous* events. Is it possible to avoid a
combinatorial explosion of axioms?

Construct a representation for exchange rates between currencies that allows for daily fluctuations.

Define the predicate ${Fixed}$, where ${Fixed}({Location}(x))$ means that the location of object $x$ is fixed over time.

Describe the event of trading something for something else. Describe buying as a kind of trading in which one of the objects traded is a sum of money.

The two preceding exercises assume a fairly primitive notion of
ownership. For example, the buyer starts by *owning* the
dollar bills. This picture begins to break down when, for example, one’s
money is in the bank, because there is no longer any specific collection
of dollar bills that one owns. The picture is complicated still further
by borrowing, leasing, renting, and bailment. Investigate the various
commonsense and legal concepts of ownership, and propose a scheme by
which they can be represented formally.

(Adapted from Fagin+al:1995.) Consider a game played
with a deck of just 8 cards, 4 aces and 4 kings. The three players,
Alice, Bob, and Carlos, are dealt two cards each. Without looking at
them, they place the cards on their foreheads so that the other players
can see them. Then the players take turns either announcing that they
know what cards are on their own forehead, thereby winning the game, or
saying “I don’t know.” Everyone knows the players are truthful and are
perfect at reasoning about beliefs.

1. Game 1. Alice and Bob have both said “I don’t know.” Carlos sees
that Alice has two aces (A-A) and Bob has two kings (K-K). What
should Carlos say? (*Hint*: consider all three possible
cases for Carlos: A-A, K-K, A-K.)

2. Describe each step of Game 1 using the notation of modal logic.

3. Game 2. Carlos, Alice, and Bob all said “I don’t know” on their
first turn. Alice holds K-K and Bob holds A-K. What should Carlos
say on his second turn?

4. Game 3. Alice, Carlos, and Bob all say “I don’t know” on their first
turn, as does Alice on her second turn. Alice and Bob both hold A-K.
What should Carlos say?

5. Prove that there will always be a winner to this game.

The assumption of *logical omniscience,* discussed on
page logical-omniscience, is of course not true of any actual reasoners.
Rather, it is an *idealization* of the reasoning process
that may be more or less acceptable depending on the applications.
Discuss the reasonableness of the assumption for each of the following
applications of reasoning about knowledge:

1. Partial knowledge adversary games, such as card games. Here one
player wants to reason about what his opponent knows about the state
of the game.

2. Chess with a clock. Here the player may wish to reason about the
limits of his opponent’s or his own ability to find the best move in
the time available. For instance, if player A has much more time
left than player B, then A will sometimes make a move that greatly
complicates the situation, in the hopes of gaining an advantage
because he has more time to work out the proper strategy.

3. A shopping agent in an environment in which there are costs of
gathering information.

4. Reasoning about public key cryptography, which rests on the
intractability of certain computational problems.

The assumption of *logical omniscience,* discussed on
page logical-omniscience, is of course not true of any actual reasoners.
Rather, it is an *idealization* of the reasoning process
that may be more or less acceptable depending on the applications.
Discuss the reasonableness of the assumption for each of the following
applications of reasoning about knowledge:

1. Partial knowledge adversary games, such as card games. Here one
player wants to reason about what his opponent knows about the state
of the game.

2. Chess with a clock. Here the player may wish to reason about the
limits of his opponent’s or his own ability to find the best move in
the time available. For instance, if player A has much more time
left than player B, then A will sometimes make a move that greatly
complicates the situation, in the hopes of gaining an advantage
because he has more time to work out the proper strategy.

3. A shopping agent in an environment in which there are costs of
gathering information.

4. Reasoning about public key cryptography, which rests on the
intractability of certain computational problems.

Translate the following description logic expression (from
page description-logic-ex) into first-order logic, and comment on the result:

$$
And(Man, AtLeast(3,Son), AtMost(2,Daughter), \\All(Son,And(Unemployed,Married, All(Spouse,Doctor ))), \\All(Daughter,And(Professor, Fills(Department ,Physics,Math))))
$$

Recall that inheritance information in semantic networks can be captured
logically by suitable implication sentences. This exercise investigates
the efficiency of using such sentences for inheritance.

1. Consider the information in a used-car catalog such as Kelly’s Blue
Book—for example, that 1973 Dodge vans are (or perhaps were once)
worth 575. Suppose all this information (for 11,000 models) is
encoded as logical sentences, as suggested in the chapter. Write
down three such sentences, including that for 1973 Dodge vans. How
would you use the sentences to find the value of a
*particular* car, given a backward-chaining theorem
prover such as Prolog?

2. Compare the time efficiency of the backward-chaining method for
solving this problem with the inheritance method used in
semantic nets.

3. Explain how forward chaining allows a logic-based system to solve
the same problem efficiently, assuming that the KB contains only the
11,000 sentences about prices.

4. Describe a situation in which neither forward nor backward chaining
on the sentences will allow the price query for an individual car to
be handled efficiently.

5. Can you suggest a solution enabling this type of query to be solved
efficiently in all cases in logic systems? *Hint:*
Remember that two cars of the same year and model have the
same price.)

One might suppose that the syntactic distinction between unboxed links and singly boxed links in semantic networks is unnecessary, because singly boxed links are always attached to categories; an inheritance algorithm could simply assume that an unboxed link attached to a category is intended to apply to all members of that category. Show that this argument is fallacious, giving examples of errors that would arise.

One part of the shopping process that was not covered in this chapter is checking for compatibility between items. For example, if a digital camera is ordered, what accessory batteries, memory cards, and cases are compatible with the camera? Write a knowledge base that can determine the compatibility of a set of items and suggest replacements or additional items if the shopper makes a choice that is not compatible. The knowledge base should works with at least one line of products and extend easily to other lines.

A complete solution to the problem of
inexact matches to the buyer’s description in shopping is very difficult
and requires a full array of natural language processing and information
retrieval techniques. (See Chapters nlp1-chapter
and nlp-english-chapter.) One small step is to allow the user to
specify minimum and maximum values for various attributes. The buyer
must use the following grammar for product descriptions:

$$
Description \rightarrow Category \space [Connector \space Modifier]*
$$
$$
Connector \rightarrow "with" \space | "and" | ","
$$
$$
Modifier \rightarrow Attribute \space |\space Attribute \space Op \space Value
$$
$$
Op \rightarrow "=" | "\gt" | "\lt"
$$
Here, ${Category}$ names a product category, ${Attribute}$ is some
feature such as “CPU” or “price,” and ${Value}$ is the target value
for the attribute. So the query “computer with at least a 2.5 GHz CPU
for under 500” must be re-expressed as “computer with CPU $>$ 2.5 GHz
and price $<$ 500.” Implement a shopping agent that accepts descriptions
in this language.

Our description of Internet shopping omitted the
all-important step of actually *buying* the product.
Provide a formal logical description of buying, using event calculus.
That is, define the sequence of events that occurs when a buyer submits
a credit-card purchase and then eventually gets billed and receives the
product.

Show from first principles that $P(ab\land a) = 1$.

Using the axioms of probability, prove that any probability distribution on a discrete random variable must sum to 1.

For each of the following statements, either prove it is true or give a
counterexample.

1. If $P(a b, c) = P(b a, c)$, then
$P(a c) = P(b c)$

2. If $P(a b, c) = P(a)$, then $P(b c) = P(b)$

3. If $P(a b) = P(a)$, then
$P(a b, c) = P(a c)$

Would it be rational for an agent to hold the three beliefs
$P(A) = 0.4$, $P(B) = 0.3$, and
$P(A \lor B) = 0.5$? If so, what range of probabilities would
be rational for the agent to hold for $A \land B$? Make up a table like
the one in Figure de-finetti-table, and show how it
supports your argument about rationality. Then draw another version of
the table where $P(A \lor B)
= 0.7$. Explain why it is rational to have this probability,
even though the table shows one case that is a loss and three that just
break even. (*Hint:* what is Agent 1 committed to about the
probability of each of the four cases, especially the case that is a
loss?)

This question deals with the properties
of possible worlds, defined on page possible-worlds-page as assignments to all
random variables. We will work with propositions that correspond to
exactly one possible world because they pin down the assignments of all
the variables. In probability theory, such propositions are called **atomic event**. For
example, with Boolean variables $X_1$, $X_2$, $X_3$, the proposition
$x_1\land \lnot x_2 \land \lnot x_3$ fixes the assignment of the
variables; in the language of propositional logic, we would say it has
exactly one model.

1. Prove, for the case of $n$ Boolean variables, that any two distinct
atomic events are mutually exclusive; that is, their conjunction is
equivalent to ${false}$.

2. Prove that the disjunction of all possible atomic events is
logically equivalent to ${true}$.

3. Prove that any proposition is logically equivalent to the
disjunction of the atomic events that entail its truth.

Prove Equation (kolmogorov-disjunction-equation) from Equations basic-probability-axiom-equation and (proposition-probability-equation.

Consider the set of all possible five-card poker hands dealt fairly from
a standard deck of fifty-two cards.

1. How many atomic events are there in the joint probability
distribution (i.e., how many five-card hands are there)?

2. What is the probability of each atomic event?

3. What is the probability of being dealt a royal straight flush? Four
of a kind?

Given the full joint distribution shown in
Figure dentist-joint-table, calculate the following:

1. $\textbf{P}({toothache})$.

2. $\textbf{P}({Cavity})$.

3. $\textbf{P}({Toothache}{cavity})$.

4. $\textbf{P}({Cavity}{toothache}\lor {catch})$.

Given the full joint distribution shown in
Figure dentist-joint-table, calculate the following:

1. $\textbf{P}({toothache})$.

2. $\textbf{P}({Catch})$.

3. $\textbf{P}({Cavity}{catch})$.

4. $\textbf{P}({Cavity}{toothache}\lor {catch})$.

In his letter of August 24, 1654, Pascal
was trying to show how a pot of money should be allocated when a
gambling game must end prematurely. Imagine a game where each turn
consists of the roll of a die, player *E* gets a point when
the die is even, and player *O* gets a point when the die
is odd. The first player to get 7 points wins the pot. Suppose the game
is interrupted with *E* leading 4–2. How should the money
be fairly split in this case? What is the general formula? (Fermat and
Pascal made several errors before solving the problem, but you should be
able to get it right the first time.)

Deciding to put probability theory to good use, we encounter a slot
machine with three independent wheels, each producing one of the four
symbols bar, bell, lemon, or
cherry with equal probability. The slot machine has the
following payout scheme for a bet of 1 coin (where “?” denotes that we
don’t care what comes up for that wheel):

> bar/bar/bar pays 20 coins

> bell/bell/bell pays 15 coins

> lemon/lemon/lemon pays 5 coins

> cherry/cherry/cherry pays 3 coins

> cherry/cherry/? pays 2 coins

> cherry/?/? pays 1 coin

1. Compute the expected “payback” percentage of the machine. In other
words, for each coin played, what is the expected coin return?

2. Compute the probability that playing the slot machine once will
result in a win.

3. Estimate the mean and median number of plays you can expect to make
until you go broke, if you start with 10 coins. You can run a
simulation to estimate this, rather than trying to compute an
exact answer.

Deciding to put probability theory to good use, we encounter a slot
machine with three independent wheels, each producing one of the four
symbols bar, bell, lemon, or
cherry with equal probability. The slot machine has the
following payout scheme for a bet of 1 coin (where “?” denotes that we
don’t care what comes up for that wheel):

> bar/bar/bar pays 20 coins

> bell/bell/bell pays 15 coins

> lemon/lemon/lemon pays 5 coins

> cherry/cherry/cherry pays 3 coins

> cherry/cherry/? pays 2 coins

> cherry/?/? pays 1 coin

1. Compute the expected “payback” percentage of the machine. In other
words, for each coin played, what is the expected coin return?

2. Compute the probability that playing the slot machine once will
result in a win.

3. Estimate the mean and median number of plays you can expect to make
until you go broke, if you start with 10 coins. You can run a
simulation to estimate this, rather than trying to compute an
exact answer.

We wish to transmit an $n$-bit message to a receiving agent. The bits in the message are independently corrupted (flipped) during transmission with $\epsilon$ probability each. With an extra parity bit sent along with the original information, a message can be corrected by the receiver if at most one bit in the entire message (including the parity bit) has been corrupted. Suppose we want to ensure that the correct message is received with probability at least $1-\delta$. What is the maximum feasible value of $n$? Calculate this value for the case $\epsilon = 0.001$, $\delta = 0.01$.

We wish to transmit an $n$-bit message to a receiving agent. The bits in the message are independently corrupted (flipped) during transmission with $\epsilon$ probability each. With an extra parity bit sent along with the original information, a message can be corrected by the receiver if at most one bit in the entire message (including the parity bit) has been corrupted. Suppose we want to ensure that the correct message is received with probability at least $1-\delta$. What is the maximum feasible value of $n$? Calculate this value for the case $\epsilon0.002$, $\delta0.01$.

Show that the three forms of independence in Equation (independence-equation) are equivalent.

Consider two medical tests, A and B, for a virus. Test A is 95% effective at recognizing the virus when it is present, but has a 10% false positive rate (indicating that the virus is present, when it is not). Test B is 90% effective at recognizing the virus, but has a 5% false positive rate. The two tests use independent methods of identifying the virus. The virus is carried by 1% of all people. Say that a person is tested for the virus using only one of the tests, and that test comes back positive for carrying the virus. Which test returning positive is more indicative of someone really carrying the virus? Justify your answer mathematically.

Suppose you are given a coin that lands ${heads}$ with probability $x$
and ${tails}$ with probability $1 - x$. Are the outcomes of successive
flips of the coin independent of each other given that you know the
value of $x$? Are the outcomes of successive flips of the coin
independent of each other if you do *not* know the value of
$x$? Justify your answer.

After your yearly checkup, the doctor has bad news and good news. The bad news is that you tested positive for a serious disease and that the test is 99% accurate (i.e., the probability of testing positive when you do have the disease is 0.99, as is the probability of testing negative when you don’t have the disease). The good news is that this is a rare disease, striking only 1 in 10,000 people of your age. Why is it good news that the disease is rare? What are the chances that you actually have the disease?

After your yearly checkup, the doctor has bad news and good news. The bad news is that you tested positive for a serious disease and that the test is 99% accurate (i.e., the probability of testing positive when you do have the disease is 0.99, as is the probability of testing negative when you don’t have the disease). The good news is that this is a rare disease, striking only 1 in 100,000 people of your age. Why is it good news that the disease is rare? What are the chances that you actually have the disease?

It is quite often useful to consider the
effect of some specific propositions in the context of some general
background evidence that remains fixed, rather than in the complete
absence of information. The following questions ask you to prove more
general versions of the product rule and Bayes’ rule, with respect to
some background evidence $\textbf{e}$:

1. Prove the conditionalized version of the general product rule:
$${\textbf{P}}(X,Y \textbf{e}) = {\textbf{P}}(XY,\textbf{e}) {\textbf{P}}(Y\textbf{e})\ .$$

2. Prove the conditionalized version of Bayes’ rule in
Equation (conditional-bayes-equation).

Show that the statement of conditional independence $${\textbf{P}}(X,Y | Z) = {\textbf{P}}(X | Z) {\textbf{P}}(Y | Z)$$ is equivalent to each of the statements $${\textbf{P}}(X | Y,Z) = {\textbf{P}}(X | Z) \quad\mbox{and}\quad {\textbf{P}}(Y | X,Z) = {\textbf{P}}(Y | Z)\ .$$

Suppose you are given a bag containing $n$ unbiased coins. You are told
that $n-1$ of these coins are normal, with heads on one side and tails
on the other, whereas one coin is a fake, with heads on both sides.

1. Suppose you reach into the bag, pick out a coin at random, flip it,
and get a head. What is the (conditional) probability that the coin
you chose is the fake coin?

2. Suppose you continue flipping the coin for a total of $k$ times
after picking it and see $k$ heads. Now what is the conditional
probability that you picked the fake coin?

3. Suppose you wanted to decide whether the chosen coin was fake by
flipping it $k$ times. The decision procedure returns ${fake}$ if
all $k$ flips come up heads; otherwise it returns ${normal}$. What
is the (unconditional) probability that this procedure makes an
error?

In this exercise, you will complete the normalization calculation for the meningitis example. First, make up a suitable value for $P(s\lnot m)$, and use it to calculate unnormalized values for $P(ms)$ and $P(\lnot m s)$ (i.e., ignoring the $P(s)$ term in the Bayes’ rule expression, Equation (meningitis-bayes-equation). Now normalize these values so that they add to 1.

This exercise investigates the way in which conditional independence
relationships affect the amount of information needed for probabilistic
calculations.

1. Suppose we wish to calculate $P(he_1,e_2)$ and we have no
conditional independence information. Which of the following sets of
numbers are sufficient for the calculation?

1. ${\textbf{P}}(E_1,E_2)$, ${\textbf{P}}(H)$,
${\textbf{P}}(E_1H)$,
${\textbf{P}}(E_2H)$
2. ${\textbf{P}}(E_1,E_2)$, ${\textbf{P}}(H)$,
${\textbf{P}}(E_1,E_2H)$

3. ${\textbf{P}}(H)$,
${\textbf{P}}(E_1H)$,
${\textbf{P}}(E_2H)$

2. Suppose we know that
${\textbf{P}}(E_1H,E_2)={\textbf{P}}(E_1H)$
for all values of $H$, $E_1$, $E_2$. Now which of the three sets are
sufficient?

Let $X$, $Y$, $Z$ be Boolean random variables. Label the eight entries
in the joint distribution ${\textbf{P}}(X,Y,Z)$ as $a$ through
$h$. Express the statement that $X$ and $Y$ are conditionally
independent given $Z$, as a set of equations relating $a$ through $h$.
How many *nonredundant*equations are there?

(Adapted from Pearl [Pearl:1988].) Suppose you are a witness to a
nighttime hit-and-run accident involving a taxi in Athens. All taxis in
Athens are blue or green. You swear, under oath, that the taxi was blue.
Extensive testing shows that, under the dim lighting conditions,
discrimination between blue and green is 75% reliable.

1. Is it possible to calculate the most likely color for the taxi?
(*Hint:* distinguish carefully between the proposition
that the taxi *is* blue and the proposition that it
*appears* blue.)

2. What if you know that 9 out of 10 Athenian taxis are green?

Write out a general algorithm for answering queries of the form
${\textbf{P}}({Cause}\textbf{e})$, using a naive Bayes
distribution. Assume that the evidence $\textbf{e}$ may assign values to
*any subset* of the effect variables.

Text categorization is the task of
assigning a given document to one of a fixed set of categories on the
basis of the text it contains. Naive Bayes models are often used for
this task. In these models, the query variable is the document category,
and the “effect” variables are the presence or absence of each word in
the language; the assumption is that words occur independently in
documents, with frequencies determined by the document category.

1. Explain precisely how such a model can be constructed, given as
“training data” a set of documents that have been assigned
to categories.

2. Explain precisely how to categorize a new document.

3. Is the conditional independence assumption reasonable? Discuss.

In our analysis of the wumpus world, we used the fact that each square contains a pit with probability 0.2, independently of the contents of the other squares. Suppose instead that exactly $N/5$ pits are scattered at random among the $N$ squares other than [1,1]. Are the variables $P_{i,j}$ and $P_{k,l}$ still independent? What is the joint distribution ${\textbf{P}}(P_{1,1},\ldots,P_{4,4})$ now? Redo the calculation for the probabilities of pits in [1,3] and [2,2].

Redo the probability calculation for pits in [1,3] and [2,2], assuming that each square contains a pit with probability 0.01, independent of the other squares. What can you say about the relative performance of a logical versus a probabilistic agent in this case?

Implement a hybrid probabilistic agent for the wumpus world, based on the hybrid agent in Figure hybrid-wumpus-agent-algorithm and the probabilistic inference procedure outlined in this chapter.

We have a bag of three biased coins $a$, $b$, and $c$ with probabilities
of coming up heads of 20%, 60%, and 80%, respectively. One coin is drawn
randomly from the bag (with equal likelihood of drawing each of the
three coins), and then the coin is flipped three times to generate the
outcomes $X_1$, $X_2$, and $X_3$.

1. Draw the Bayesian network corresponding to this setup and define the
necessary CPTs.

2. Calculate which coin was most likely to have been drawn from the bag
if the observed flips come out heads twice and tails once.

We have a bag of three biased coins $a$, $b$, and $c$ with probabilities
of coming up heads of 30%, 60%, and 75%, respectively. One coin is drawn
randomly from the bag (with equal likelihood of drawing each of the
three coins), and then the coin is flipped three times to generate the
outcomes $X_1$, $X_2$, and $X_3$.

1. Draw the Bayesian network corresponding to this setup and define the
necessary CPTs.

2. Calculate which coin was most likely to have been drawn from the bag
if the observed flips come out heads twice and tails once.

Equation (parameter-joint-repn-equation on
page parameter-joint-repn-equation defines the joint distribution represented by a
Bayesian network in terms of the parameters
$\theta(X_i{Parents}(X_i))$. This exercise asks you to derive
the equivalence between the parameters and the conditional probabilities
${\textbf{ P}}(X_i{Parents}(X_i))$ from this definition.

1. Consider a simple network $X\rightarrow Y\rightarrow Z$ with three
Boolean variables. Use
Equations (conditional-probability-equation and (marginalization-equation
(pages conditional-probability-equation and marginalization-equation)
to express the conditional probability $P(zy)$ as the ratio of two sums, each over entries in the
joint distribution ${\textbf{P}}(X,Y,Z)$.

2. Now use Equation (parameter-joint-repn-equation to
write this expression in terms of the network parameters
$\theta(X)$, $\theta(YX)$, and $\theta(ZY)$.

3. Next, expand out the summations in your expression from part (b),
writing out explicitly the terms for the true and false values of
each summed variable. Assuming that all network parameters satisfy
the constraint
$\sum_{x_i} \theta(x_i{parents}(X_i))1$, show
that the resulting expression reduces to $\theta(zy)$.

4. Generalize this derivation to show that
$\theta(X_i{Parents}(X_i)) = {\textbf{P}}(X_i{Parents}(X_i))$
for any Bayesian network.

The **arc reversal** operation of in a Bayesian network allows us to change the direction
of an arc $X\rightarrow Y$ while preserving the joint probability
distribution that the network represents Shachter:1986. Arc reversal
may require introducing new arcs: all the parents of $X$ also become
parents of $Y$, and all parents of $Y$ also become parents of $X$.

1. Assume that $X$ and $Y$ start with $m$ and $n$ parents,
respectively, and that all variables have $k$ values. By calculating
the change in size for the CPTs of $X$ and $Y$, show that the total
number of parameters in the network cannot decrease during
arc reversal. (*Hint*: the parents of $X$ and $Y$ need
not be disjoint.)

2. Under what circumstances can the total number remain constant?

3. Let the parents of $X$ be $\textbf{U} \cup \textbf{V}$ and the parents of $Y$ be
$\textbf{V} \cup \textbf{W}$, where $\textbf{U}$ and $\textbf{W}$ are disjoint. The formulas for the
new CPTs after arc reversal are as follows: $$\begin{aligned}
{\textbf{P}}(Y | \textbf{U},\textbf{V},\textbf{W}) &=& \sum_x {\textbf{P}}(Y | \textbf{V},\textbf{W}, x) {\textbf{P}}(x | \textbf{U}, \textbf{V}) \\
{\textbf{P}}(X | \textbf{U},\textbf{V},\textbf{W}, Y) &=& {\textbf{P}}(Y | X, \textbf{V}, \textbf{W}) {\textbf{P}}(X | \textbf{U}, \textbf{V}) / {\textbf{P}}(Y | \textbf{U},\textbf{V},\textbf{W})\ .\end{aligned}$$
Prove that the new network expresses the same joint distribution
over all variables as the original network.

Consider the Bayesian network in
Figure burglary-figure.

1. If no evidence is observed, are ${Burglary}$ and ${Earthquake}$
independent? Prove this from the numerical semantics and from the
topological semantics.

2. If we observe ${Alarm}{true}$, are ${Burglary}$ and
${Earthquake}$ independent? Justify your answer by calculating
whether the probabilities involved satisfy the definition of
conditional independence.

Suppose that in a Bayesian network containing an unobserved variable
$Y$, all the variables in the Markov blanket ${MB}(Y)$ have been
observed.

1. Prove that removing the node $Y$ from the network will not affect
the posterior distribution for any other unobserved variable in
the network.

2. Discuss whether we can remove $Y$ if we are planning to use (i)
rejection sampling and (ii) likelihood weighting.

Let $H_x$ be a random variable denoting the
handedness of an individual $x$, with possible values $l$ or $r$. A
common hypothesis is that left- or right-handedness is inherited by a
simple mechanism; that is, perhaps there is a gene $G_x$, also with
values $l$ or $r$, and perhaps actual handedness turns out mostly the
same (with some probability $s$) as the gene an individual possesses.
Furthermore, perhaps the gene itself is equally likely to be inherited
from either of an individual’s parents, with a small nonzero probability
$m$ of a random mutation flipping the handedness.

1. Which of the three networks in
Figure handedness-figure claim that
$ {\textbf{P}}(G_{father},G_{mother},G_{child}) = {\textbf{P}}(G_{father}){\textbf{P}}(G_{mother}){\textbf{P}}(G_{child})$?

2. Which of the three networks make independence claims that are
consistent with the hypothesis about the inheritance of handedness?

3. Which of the three networks is the best description of the
hypothesis?

4. Write down the CPT for the $G_{child}$ node in network (a), in
terms of $s$ and $m$.

5. Suppose that
$P(G_{father}l)=P(G_{mother}l)=q$. In
network (a), derive an expression for $P(G_{child}l)$
in terms of $m$ and $q$ only, by conditioning on its parent nodes.

6. Under conditions of genetic equilibrium, we expect the distribution
of genes to be the same across generations. Use this to calculate
the value of $q$, and, given what you know about handedness in
humans, explain why the hypothesis described at the beginning of
this question must be wrong.

The **Markov
blanket** of a variable is defined on page markov-blanket-page.
Prove that a variable is independent of all other variables in the
network, given its Markov blanket and derive
Equation (markov-blanket-equation)
(page markov-blanket-equation).

Consider the network for car diagnosis shown in
Figure car-starts-figure

.
1. Extend the network with the Boolean variables ${IcyWeather}$ and
${StarterMotor}$.

2. Give reasonable conditional probability tables for all the nodes.

3. How many independent values are contained in the joint probability
distribution for eight Boolean nodes, assuming that no conditional
independence relations are known to hold among them?

4. How many independent probability values do your network tables
contain?

5. The conditional distribution for ${Starts}$ could be described as
a **noisy-AND** distribution. Define this
family in general and relate it to the noisy-OR distribution.

Consider a simple Bayesian network with root variables ${Cold}$, ${Flu}$, and ${Malaria}$ and child variable ${Fever}$, with a noisy-OR conditional distribution for ${Fever}$ as described in Section canonical-distribution-section. By adding appropriate auxiliary variables for inhibition events and fever-inducing events, construct an equivalent Bayesian network whose CPTs (except for root variables) are deterministic. Define the CPTs and prove equivalence.

Consider the family of linear Gaussian networks, as
defined on page LG-network-page

.
1. In a two-variable network, let $X_1$ be the parent of $X_2$, let
$X_1$ have a Gaussian prior, and let
${\textbf{P}}(X_2X_1)$ be a linear
Gaussian distribution. Show that the joint distribution $P(X_1,X_2)$
is a multivariate Gaussian, and calculate its covariance matrix.

2. Prove by induction that the joint distribution for a general linear
Gaussian network on $X_1,\ldots,X_n$ is also a
multivariate Gaussian.

The probit distribution defined on
page probit-page describes the probability distribution for a Boolean
child, given a single continuous parent.

1. How might the definition be extended to cover multiple continuous
parents?

2. How might it be extended to handle a *multivalued*
child variable? Consider both cases where the child’s values are
ordered (as in selecting a gear while driving, depending on speed,
slope, desired acceleration, etc.) and cases where they are
unordered (as in selecting bus, train, or car to get to work).
(*Hint*: Consider ways to divide the possible values
into two sets, to mimic a Boolean variable.)

In your local nuclear power station, there is an alarm that senses when
a temperature gauge exceeds a given threshold. The gauge measures the
temperature of the core. Consider the Boolean variables $A$ (alarm
sounds), $F_A$ (alarm is faulty), and $F_G$ (gauge is faulty) and the
multivalued nodes $G$ (gauge reading) and $T$ (actual core temperature).

1. Draw a Bayesian network for this domain, given that the gauge is
more likely to fail when the core temperature gets too high.

2. Is your network a polytree? Why or why not?

3. Suppose there are just two possible actual and measured
temperatures, normal and high; the probability that the gauge gives
the correct temperature is $x$ when it is working, but $y$ when it
is faulty. Give the conditional probability table associated with
$G$.

4. Suppose the alarm works correctly unless it is faulty, in which case
it never sounds. Give the conditional probability table associated
with $A$.

5. Suppose the alarm and gauge are working and the alarm sounds.
Calculate an expression for the probability that the temperature of
the core is too high, in terms of the various conditional
probabilities in the network.

Two astronomers in different parts of the world
make measurements $M_1$ and $M_2$ of the number of stars $N$ in some
small region of the sky, using their telescopes. Normally, there is a
small possibility $e$ of error by up to one star in each direction. Each
telescope can also (with a much smaller probability $f$) be badly out of
focus (events $F_1$ and $F_2$), in which case the scientist will
undercount by three or more stars (or if $N$ is less than 3, fail to
detect any stars at all). Consider the three networks shown in
Figure telescope-nets-figure.

1. Which of these Bayesian networks are correct (but not
necessarily efficient) representations of the preceding information?

2. Which is the best network? Explain.

3. Write out a conditional distribution for
${\textbf{P}}(M_1N)$, for the case where
$N\{1,2,3\}$ and $M_1\{0,1,2,3,4\}$. Each
entry in the conditional distribution should be expressed as a
function of the parameters $e$ and/or $f$.

4. Suppose $M_11$ and $M_23$. What are the
*possible* numbers of stars if you assume no prior
constraint on the values of $N$?

5. What is the *most likely* number of stars, given these
observations? Explain how to compute this, or if it is not possible
to compute, explain what additional information is needed and how it
would affect the result.

Consider the network shown in
Figure telescope-nets-figure(ii), and assume that the
two telescopes work identically. $N\{1,2,3\}$ and
$M_1,M_2\{0,1,2,3,4\}$, with the symbolic CPTs as described
in Exercise telescope-exercise. Using the enumeration
algorithm (Figure enumeration-algorithm on
page enumeration-algorithm), calculate the probability distribution
${\textbf{P}}(NM_12,M_22)$.

Consider the Bayes net shown in Figure politics-figure

.
1. Which of the following are asserted by the network
*structure*?

1. ${\textbf{P}}(B,I,M) = {\textbf{P}}(B){\textbf{P}}(I){\textbf{P}}(M)$.

2. ${\textbf{P}}(J|G) = {\textbf{P}}(J|G,I)$.

3. ${\textbf{P}}(M|G,B,I) = {\textbf{P}}(M|G,B,I,J)$.

2. Calculate the value of $P(b,i,\lnot m,g,j)$.

3. Calculate the probability that someone goes to jail given that they
broke the law, have been indicted, and face a politically
motivated prosecutor.

4. A **context-specific independence** (see
page CSI-page) allows a variable to be independent of some of
its parents given certain values of others. In addition to the usual
conditional independences given by the graph structure, what
context-specific independences exist in the Bayes net in
Figure politics-figure?

5. Suppose we want to add the variable
$P={PresidentialPardon}$ to the network; draw the new
network and briefly explain any links you add.

Consider the Bayes net shown in Figure politics-figure

.
1. Which of the following are asserted by the network
*structure*?

1. ${\textbf{P}}(B,I,M) = {\textbf{P}}(B){\textbf{P}}(I){\textbf{P}}(M)$.

2. ${\textbf{P}}(J|G) = {\textbf{P}}(J|G,I)$.

3. ${\textbf{P}}(M|G,B,I) = {\textbf{P}}(M|G,B,I,J)$.

2. Calculate the value of $P(b,i,\lnot m,g,j)$.

3. Calculate the probability that someone goes to jail given that they
broke the law, have been indicted, and face a politically
motivated prosecutor.

4. A **context-specific independence** (see
page CSI-page) allows a variable to be independent of some of
its parents given certain values of others. In addition to the usual
conditional independences given by the graph structure, what
context-specific independences exist in the Bayes net in
Figure politics-figure?

5. Suppose we want to add the variable
$P={PresidentialPardon}$ to the network; draw the new
network and briefly explain any links you add.

Consider the variable elimination algorithm in
Figure elimination-ask-algorithm (page elimination-ask-algorithm).

1. Section exact-inference-section applies variable
elimination to the query
$${\textbf{P}}({Burglary}{JohnCalls}{true},{MaryCalls}{true})\ .$$
Perform the calculations indicated and check that the answer
is correct.

2. Count the number of arithmetic operations performed, and compare it
with the number performed by the enumeration algorithm.

3. Suppose a network has the form of a *chain*: a sequence
of Boolean variables $X_1,\ldots, X_n$ where
${Parents}(X_i)\{X_{i-1}\}$ for $i2,\ldots,n$.
What is the complexity of computing
${\textbf{P}}(X_1X_n{true})$ using
enumeration? Using variable elimination?

4. Prove that the complexity of running variable elimination on a
polytree network is linear in the size of the tree for any variable
ordering consistent with the network structure.

Investigate the complexity of exact inference
in general Bayesian networks:

1. Prove that any 3-SAT problem can be reduced to exact inference in a
Bayesian network constructed to represent the particular problem and
hence that exact inference is NP-hard. (*Hint*:
Consider a network with one variable for each proposition symbol,
one for each clause, and one for the conjunction of clauses.)

2. The problem of counting the number of satisfying assignments for a
3-SAT problem is \#P-complete. Show that exact inference is at least
as hard as this.

Consider the problem of generating a
random sample from a specified distribution on a single variable. Assume
you have a random number generator that returns a random number
uniformly distributed between 0 and 1.

1. Let $X$ be a discrete variable with
$P(Xx_i)p_i$ for
$i\{1,\ldots,k\}$. The **cumulative distribution** of $X$ gives the probability
that $X\{x_1,\ldots,x_j\}$ for each possible $j$. (See
also Appendix [math-appendix].) Explain how to
calculate the cumulative distribution in $O(k)$ time and how to
generate a single sample of $X$ from it. Can the latter be done in
less than $O(k)$ time?

2. Now suppose we want to generate $N$ samples of $X$, where $N\gg k$.
Explain how to do this with an expected run time per sample that is
*constant* (i.e., independent of $k$).

3. Now consider a continuous-valued variable with a parameterized
distribution (e.g., Gaussian). How can samples be generated from
such a distribution?

4. Suppose you want to query a continuous-valued variable and you are
using a sampling algorithm such as LIKELIHOODWEIGHTING to do the inference. How would
you have to modify the query-answering process?

Consider the query
${\textbf{P}}({Rain}{Sprinkler}{true},{WetGrass}{true})$
in Figure rain-clustering-figure(a)
(page rain-clustering-figure) and how Gibbs sampling can answer it.

1. How many states does the Markov chain have?

2. Calculate the **transition matrix**
${\textbf{Q}}$ containing
$q({\textbf{y}}$ $\rightarrow$ ${\textbf{y}}')$
for all ${\textbf{y}}$, ${\textbf{y}}'$.

3. What does ${\textbf{ Q}}^2$, the square of the
transition matrix, represent?

4. What about ${\textbf{Q}}^n$ as $n\to \infty$?

5. Explain how to do probabilistic inference in Bayesian networks,
assuming that ${\textbf{Q}}^n$ is available. Is this a
practical way to do inference?

This exercise explores the stationary
distribution for Gibbs sampling methods.

1. The convex composition $[\alpha, q_1; 1-\alpha, q_2]$ of $q_1$ and
$q_2$ is a transition probability distribution that first chooses
one of $q_1$ and $q_2$ with probabilities $\alpha$ and $1-\alpha$,
respectively, and then applies whichever is chosen. Prove that if
$q_1$ and $q_2$ are in detailed balance with $\pi$, then their
convex composition is also in detailed balance with $\pi$.
(*Note*: this result justifies a variant of GIBBS-ASK in which
variables are chosen at random rather than sampled in a
fixed sequence.)

2. Prove that if each of $q_1$ and $q_2$ has $\pi$ as its stationary
distribution, then the sequential composition
$q q_1 \circ q_2$ also has $\pi$ as its
stationary distribution.

The **Metropolis--Hastings** algorithm is a member of the MCMC family; as such,
it is designed to generate samples $\textbf{x}$ (eventually) according to target
probabilities $\pi(\textbf{x})$. (Typically we are interested in sampling from
$\pi(\textbf{x})P(\textbf{x}\textbf{e})$.) Like simulated annealing,
Metropolis–Hastings operates in two stages. First, it samples a new
state $\textbf{x'}$ from a **proposal distribution** $q(\textbf{x'}\textbf{x})$, given the current state $\textbf{x}$.
Then, it probabilistically accepts or rejects $\textbf{x'}$ according to the **acceptance probability**
$$\alpha(\textbf{x'}\textbf{x}) = \min\ \left(1,\frac{\pi(\textbf{x'})q(\textbf{x}\textbf{x'})}{\pi(\textbf{x})q(\textbf{x'}\textbf{x})} \right)\ .$$
If the proposal is rejected, the state remains at $\textbf{x}$.

1. Consider an ordinary Gibbs sampling step for a specific variable
$X_i$. Show that this step, considered as a proposal, is guaranteed
to be accepted by Metropolis–Hastings. (Hence, Gibbs sampling is a
special case of Metropolis–Hastings.)

2. Show that the two-step process above, viewed as a transition
probability distribution, is in detailed balance with $\pi$.

Three soccer teams $A$, $B$, and $C$, play each
other once. Each match is between two teams, and can be won, drawn, or
lost. Each team has a fixed, unknown degree of quality—an integer
ranging from 0 to 3—and the outcome of a match depends probabilistically
on the difference in quality between the two teams.

1. Construct a relational probability model to describe this domain,
and suggest numerical values for all the necessary
probability distributions.

2. Construct the equivalent Bayesian network for the three matches.

3. Suppose that in the first two matches $A$ beats $B$ and draws with
$C$. Using an exact inference algorithm of your choice, compute the
posterior distribution for the outcome of the third match.

4. Suppose there are $n$ teams in the league and we have the results
for all but the last match. How does the complexity of predicting
the last game vary with $n$?

5. Investigate the application of MCMC to this problem. How quickly
does it converge in practice and how well does it scale?

Show that any second-order Markov
process can be rewritten as a first-order Markov process with an
augmented set of state variables. Can this always be done
*parsimoniously*, i.e., without increasing the number of
parameters needed to specify the transition model?

In this exercise, we examine what
happens to the probabilities in the umbrella world in the limit of long
time sequences.

1. Suppose we observe an unending sequence of days on which the
umbrella appears. Show that, as the days go by, the probability of
rain on the current day increases monotonically toward a
fixed point. Calculate this fixed point.

2. Now consider *forecasting* further and further into the
future, given just the first two umbrella observations. First,
compute the probability $P(r_{2+k}|u_1,u_2)$ for
$k=1 \ldots 20$ and plot the results. You should see that
the probability converges towards a fixed point. Prove that the
exact value of this fixed point is 0.5.

This exercise develops a space-efficient variant of
the forward–backward algorithm described in
Figure forward-backward-algorithm (page forward-backward-algorithm).
We wish to compute $\textbf{P} (\textbf{X}_k|\textbf{e}_{1:t})$ for
$k=1,\ldots ,t$. This will be done with a divide-and-conquer
approach.

1. Suppose, for simplicity, that $t$ is odd, and let the halfway point
be $h=(t+1)/2$. Show that $\textbf{P} (\textbf{X}_k|\textbf{e}_{1:t}) $
can be computed for
$k=1,\ldots ,h$ given just the initial forward message
$\textbf{f}_{1:0}$, the backward message $\textbf{b}_{h+1:t}$, and the evidence
$\textbf{e}_{1:h}$.

2. Show a similar result for the second half of the sequence.

3. Given the results of (a) and (b), a recursive divide-and-conquer
algorithm can be constructed by first running forward along the
sequence and then backward from the end, storing just the required
messages at the middle and the ends. Then the algorithm is called on
each half. Write out the algorithm in detail.

4. Compute the time and space complexity of the algorithm as a function
of $t$, the length of the sequence. How does this change if we
divide the input into more than two pieces?

On page flawed-viterbi-page, we outlined a flawed procedure for finding the most likely state sequence, given an observation sequence. The procedure involves finding the most likely state at each time step, using smoothing, and returning the sequence composed of these states. Show that, for some temporal probability models and observation sequences, this procedure returns an impossible state sequence (i.e., the posterior probability of the sequence is zero).

Equation (matrix-filtering-equation) describes the filtering process for the matrix formulation of HMMs. Give a similar equation for the calculation of likelihoods, which was described generically in Equation (forward-likelihood-equation).

Consider the vacuum worlds of Figure vacuum-maze-ch4-figure (perfect sensing) and Figure vacuum-maze-hmm2-figure (noisy sensing). Suppose that the robot receives an observation sequence such that, with perfect sensing, there is exactly one possible location it could be in. Is this location necessarily the most probable location under noisy sensing for sufficiently small noise probability $\epsilon$? Prove your claim or find a counterexample.

In Section hmm-localization-section, the prior distribution over locations is uniform and the transition model assumes an equal probability of moving to any neighboring square. What if those assumptions are wrong? Suppose that the initial location is actually chosen uniformly from the northwest quadrant of the room and the action actually tends to move southeast. Keeping the HMM model fixed, explore the effect on localization and path accuracy as the southeasterly tendency increases, for different values of $\epsilon$.

Consider a version of the vacuum robot
(page vacuum-maze-hmm2-figure) that has the policy of going straight for as long
as it can; only when it encounters an obstacle does it change to a new
(randomly selected) heading. To model this robot, each state in the
model consists of a *(location, heading)* pair. Implement
this model and see how well the Viterbi algorithm can track a robot with
this model. The robot’s policy is more constrained than the random-walk
robot; does that mean that predictions of the most likely path are more
accurate?

We have described three policies for the vacuum robot: (1) a uniform random walk, (2) a bias for wandering southeast, as described in Exercise hmm-robust-exercise, and (3) the policy described in Exercise roomba-viterbi-exercise. Suppose an observer is given the observation sequence from a vacuum robot, but is not sure which of the three policies the robot is following. What approach should the observer use to find the most likely path, given the observations? Implement the approach and test it. How much does the localization accuracy suffer, compared to the case in which the observer knows which policy the robot is following?

This exercise is concerned with filtering in an environment with no landmarks. Consider a vacuum robot in an empty room, represented by an $n \times m$ rectangular grid. The robot’s location is hidden; the only evidence available to the observer is a noisy location sensor that gives an approximation to the robot’s location. If the robot is at location $(x, y)$ then with probability .1 the sensor gives the correct location, with probability .05 each it reports one of the 8 locations immediately surrounding $(x, y)$, with probability .025 each it reports one of the 16 locations that surround those 8, and with the remaining probability of .1 it reports “no reading.” The robot’s policy is to pick a direction and follow it with probability .8 on each step; the robot switches to a randomly selected new heading with probability .2 (or with probability 1 if it encounters a wall). Implement this as an HMM and do filtering to track the robot. How accurately can we track the robot’s path?

This exercise is concerned with filtering in an environment with no landmarks. Consider a vacuum robot in an empty room, represented by an $n \times m$ rectangular grid. The robot’s location is hidden; the only evidence available to the observer is a noisy location sensor that gives an approximation to the robot’s location. If the robot is at location $(x, y)$ then with probability .1 the sensor gives the correct location, with probability .05 each it reports one of the 8 locations immediately surrounding $(x, y)$, with probability .025 each it reports one of the 16 locations that surround those 8, and with the remaining probability of .1 it reports “no reading.” The robot’s policy is to pick a direction and follow it with probability .7 on each step; the robot switches to a randomly selected new heading with probability .3 (or with probability 1 if it encounters a wall). Implement this as an HMM and do filtering to track the robot. How accurately can we track the robot’s path?

Often, we wish to monitor a continuous-state
system whose behavior switches unpredictably among a set of $k$ distinct
“modes.” For example, an aircraft trying to evade a missile can execute
a series of distinct maneuvers that the missile may attempt to track. A
Bayesian network representation of such a **switching Kalman
filter** model is shown in
Figure switching-kf-figure.

1. Suppose that the discrete state $S_t$ has $k$ possible values and
that the prior continuous state estimate
${\textbf{P}}(\textbf{X}_0)$ is a multivariate
Gaussian distribution. Show that the prediction
${\textbf{P}}(\textbf{X}_1)$ is a **mixture of
Gaussians**—that is, a weighted sum of Gaussians such
that the weights sum to 1.

2. Show that if the current continuous state estimate
${\textbf{P}}(\textbf{X}_t|\textbf{e}_{1:t})$ is a mixture of $m$ Gaussians,
then in the general case the updated state estimate
${\textbf{P}}(\textbf{X}_{t+1}|\textbf{e}_{1:t+1})$ will be a mixture of
$km$ Gaussians.

3. What aspect of the temporal process do the weights in the Gaussian
mixture represent?

The results in (a) and (b) show that the representation of the posterior
grows without limit even for switching Kalman filters, which are among
the simplest hybrid dynamic models.

Complete the missing step in the derivation of Equation (kalman-one-step-equation) on page kalman-one-step-equation, the first update step for the one-dimensional Kalman filter.

Let us examine the behavior of the variance
update in Equation (kalman-univariate-equation)
(page kalman-univariate-equation).

1. Plot the value of $\sigma_t^2$ as a function of $t$, given various
values for $\sigma_x^2$ and $\sigma_z^2$.

2. Show that the update has a fixed point $\sigma^2$ such that
$\sigma_t^2 \rightarrow \sigma^2$ as $t \rightarrow \infty$, and
calculate the value of $\sigma^2$.

3. Give a qualitative explanation for what happens as
$\sigma_x^2\rightarrow 0$ and as $\sigma_z^2\rightarrow 0$.

A professor wants to know if students are getting
enough sleep. Each day, the professor observes whether the students
sleep in class, and whether they have red eyes. The professor has the
following domain theory:

- The prior probability of getting enough sleep, with no observations,
is 0.7.

- The probability of getting enough sleep on night $t$ is 0.8 given
that the student got enough sleep the previous night, and 0.3
if not.

- The probability of having red eyes is 0.2 if the student got enough
sleep, and 0.7 if not.

- The probability of sleeping in class is 0.1 if the student got
enough sleep, and 0.3 if not.

Formulate this information as a dynamic Bayesian network that the
professor could use to filter or predict from a sequence of
observations. Then reformulate it as a hidden Markov model that has only
a single observation variable. Give the complete probability tables for
the model.

A professor wants to know if students are getting
enough sleep. Each day, the professor observes whether the students
sleep in class, and whether they have red eyes. The professor has the
following domain theory:

- The prior probability of getting enough sleep, with no observations,
is 0.7.

- The probability of getting enough sleep on night $t$ is 0.8 given
that the student got enough sleep the previous night, and 0.3
if not.

- The probability of having red eyes is 0.2 if the student got enough
sleep, and 0.7 if not.

- The probability of sleeping in class is 0.1 if the student got
enough sleep, and 0.3 if not.

Formulate this information as a dynamic Bayesian network that the
professor could use to filter or predict from a sequence of
observations. Then reformulate it as a hidden Markov model that has only
a single observation variable. Give the complete probability tables for
the model.

For the DBN specified in Exercise sleep1-exercise and
for the evidence values

$\textbf{e}_1 = not\space red\space eyes,\space not\space sleeping\space in\space class$

$\textbf{e}_2 = red\space eyes,\space not\space sleeping\space in\space class$

$\textbf{e}_3 = red\space eyes,\space sleeping\space in\space class$

perform the following computations:

1. State estimation: Compute $P({EnoughSleep}_t | \textbf{e}_{1:t})$ for each
of $t = 1,2,3$.

2. Smoothing: Compute $P({EnoughSleep}_t | \textbf{e}_{1:3})$ for each of
$t = 1,2,3$.

3. Compare the filtered and smoothed probabilities for $t=1$ and $t=2$.

Suppose that a particular student shows up with red eyes and sleeps in class every day. Given the model described in Exercise sleep1-exercise, explain why the probability that the student had enough sleep the previous night converges to a fixed point rather than continuing to go down as we gather more days of evidence. What is the fixed point? Answer this both numerically (by computation) and analytically.

This exercise analyzes in more detail the
persistent-failure model for the battery sensor in
Figure battery-persistence-figure(a)
(page battery-persistence-figure).

1. Figure battery-persistence-figure(b) stops at
$t=32$. Describe qualitatively what should happen as
$t\to\infty$ if the sensor continues to read 0.

2. Suppose that the external temperature affects the battery sensor in
such a way that transient failures become more likely as
temperature increases. Show how to augment the DBN structure in
Figure battery-persistence-figure(a), and explain
any required changes to the CPTs.

3. Given the new network structure, can battery readings be used by the
robot to infer the current temperature?

Consider applying the variable elimination algorithm to the umbrella DBN unrolled for three slices, where the query is ${\textbf{P}}(R_3|u_1,u_2,u_3)$. Show that the space complexity of the algorithm—the size of the largest factor—is the same, regardless of whether the rain variables are eliminated in forward or backward order.

(Adapted from David Heckerman.) This exercise concerns
the **Almanac Game**, which is used by
decision analysts to calibrate numeric estimation. For each of the
questions that follow, give your best guess of the answer, that is, a
number that you think is as likely to be too high as it is to be too
low. Also give your guess at a 25th percentile estimate, that is, a
number that you think has a 25% chance of being too high, and a 75%
chance of being too low. Do the same for the 75th percentile. (Thus, you
should give three estimates in all—low, median, and high—for each
question.)

1. Number of passengers who flew between New York and Los Angeles
in 1989.

2. Population of Warsaw in 1992.

3. Year in which Coronado discovered the Mississippi River.

4. Number of votes received by Jimmy Carter in the 1976
presidential election.

5. Age of the oldest living tree, as of 2002.

6. Height of the Hoover Dam in feet.

7. Number of eggs produced in Oregon in 1985.

8. Number of Buddhists in the world in 1992.

9. Number of deaths due to AIDS in the United States
in 1981.

10. Number of U.S. patents granted in 1901.

The correct answers appear after the last exercise of this chapter. From
the point of view of decision analysis, the interesting thing is not how
close your median guesses came to the real answers, but rather how often
the real answer came within your 25% and 75% bounds. If it was about
half the time, then your bounds are accurate. But if you’re like most
people, you will be more sure of yourself than you should be, and fewer
than half the answers will fall within the bounds. With practice, you
can calibrate yourself to give realistic bounds, and thus be more useful
in supplying information for decision making. Try this second set of
questions and see if there is any improvement:

1. Year of birth of Zsa Zsa Gabor.

2. Maximum distance from Mars to the sun in miles.

3. Value in dollars of exports of wheat from the United States in 1992.

4. Tons handled by the port of Honolulu in 1991.

5. Annual salary in dollars of the governor of California in 1993.

6. Population of San Diego in 1990.

7. Year in which Roger Williams founded Providence, Rhode Island.

8. Height of Mt. Kilimanjaro in feet.

9. Length of the Brooklyn Bridge in feet.

10. Number of deaths due to automobile accidents in the United States
in 1992.

Chris considers four used cars before buying the one with maximum expected utility. Pat considers ten cars and does the same. All other things being equal, which one is more likely to have the better car? Which is more likely to be disappointed with their car’s quality? By how much (in terms of standard deviations of expected quality)?

Chris considers five used cars before buying the one with maximum expected utility. Pat considers eleven cars and does the same. All other things being equal, which one is more likely to have the better car? Which is more likely to be disappointed with their car’s quality? By how much (in terms of standard deviations of expected quality)?

In 1713, Nicolas Bernoulli stated a puzzle,
now called the St. Petersburg paradox, which works as follows. You have
the opportunity to play a game in which a fair coin is tossed repeatedly
until it comes up heads. If the first heads appears on the $n$th toss,
you win $2^n$ dollars.

1. Show that the expected monetary value of this game is infinite.

2. How much would you, personally, pay to play the game?

3. Nicolas’s cousin Daniel Bernoulli resolved the apparent paradox in
1738 by suggesting that the utility of money is measured on a
logarithmic scale (i.e., $U(S_{n}) = a\log_2 n +b$, where $S_n$ is
the state of having $n$). What is the expected utility of the game
under this assumption?

4. What is the maximum amount that it would be rational to pay to play
the game, assuming that one’s initial wealth is $k$?

Write a computer program to automate the process in Exercise assessment-exercise. Try your program out on several people of different net worth and political outlook. Comment on the consistency of your results, both for an individual and across individuals.

The Surprise Candy Company makes candy in
two flavors: 75% are strawberry flavor and 25% are anchovy flavor. Each
new piece of candy starts out with a round shape; as it moves along the
production line, a machine randomly selects a certain percentage to be
trimmed into a square; then, each piece is wrapped in a wrapper whose
color is chosen randomly to be red or brown. 70% of the strawberry
candies are round and 70% have a red wrapper, while 90% of the anchovy
candies are square and 90% have a brown wrapper. All candies are sold
individually in sealed, identical, black boxes.

Now you, the customer, have just bought a Surprise candy at the store
but have not yet opened the box. Consider the three Bayes nets in
Figure 3candy-figure.

1. Which network(s) can correctly represent
${\textbf{P}}(Flavor,Wrapper,Shape)$?

2. Which network is the best representation for this problem?

3. Does network (i) assert that
${\textbf{P}}(Wrapper|Shape){\textbf{P}}(Wrapper)$?

4. What is the probability that your candy has a red wrapper?

5. In the box is a round candy with a red wrapper. What is the
probability that its flavor is strawberry?

6. A unwrapped strawberry candy is worth $s$ on the open market and an
unwrapped anchovy candy is worth $a$. Write an expression for the
value of an unopened candy box.

7. A new law prohibits trading of unwrapped candies, but it is still
legal to trade wrapped candies (out of the box). Is an unopened
candy box now worth more than less than, or the same as before?

The Surprise Candy Company makes candy in
two flavors: 70% are strawberry flavor and 30% are anchovy flavor. Each
new piece of candy starts out with a round shape; as it moves along the
production line, a machine randomly selects a certain percentage to be
trimmed into a square; then, each piece is wrapped in a wrapper whose
color is chosen randomly to be red or brown. 80% of the strawberry
candies are round and 80% have a red wrapper, while 90% of the anchovy
candies are square and 90% have a brown wrapper. All candies are sold
individually in sealed, identical, black boxes.

Now you, the customer, have just bought a Surprise candy at the store
but have not yet opened the box. Consider the three Bayes nets in
Figure 3candy-figure.

1. Which network(s) can correctly represent
${\textbf{P}}(Flavor,Wrapper,Shape)$?

2. Which network is the best representation for this problem?

3. Does network (i) assert that
${\textbf{P}}(Wrapper|Shape){\textbf{P}}(Wrapper)$?

4. What is the probability that your candy has a red wrapper?

5. In the box is a round candy with a red wrapper. What is the
probability that its flavor is strawberry?

6. A unwrapped strawberry candy is worth $s$ on the open market and an
unwrapped anchovy candy is worth $a$. Write an expression for the
value of an unopened candy box.

7. A new law prohibits trading of unwrapped candies, but it is still
legal to trade wrapped candies (out of the box). Is an unopened
candy box now worth more than less than, or the same as before?

Prove that the judgments $B \succ A$ and $C \succ D$ in the Allais paradox (page allais-page) violate the axiom of substitutability.

Consider the Allais paradox described on page allais-page: an agent who prefers $B$ over $A$ (taking the sure thing), and $C$ over $D$ (taking the higher EMV) is not acting rationally, according to utility theory. Do you think this indicates a problem for the agent, a problem for the theory, or no problem at all? Explain.

Tickets to a lottery cost 1. There are two possible prizes: a 10 payoff with probability 1/50, and a 1,000,000 payoff with probability 1/2,000,000. What is the expected monetary value of a lottery ticket? When (if ever) is it rational to buy a ticket? Be precise—show an equation involving utilities. You may assume current wealth of $k$ and that $U(S_k)=0$. You may also assume that $U(S_{k+{10}}) = {10}\times U(S_{k+1})$, but you may not make any assumptions about $U(S_{k+1,{000},{000}})$. Sociological studies show that people with lower income buy a disproportionate number of lottery tickets. Do you think this is because they are worse decision makers or because they have a different utility function? Consider the value of contemplating the possibility of winning the lottery versus the value of contemplating becoming an action hero while watching an adventure movie.

Assess your own utility for different incremental amounts of money by running a series of preference tests between some definite amount $M_1$ and a lottery $[p,M_2; (1-p), 0]$. Choose different values of $M_1$ and $M_2$, and vary $p$ until you are indifferent between the two choices. Plot the resulting utility function.

How much is a micromort worth to you? Devise a protocol to determine this. Ask questions based both on paying to avoid risk and being paid to accept risk.

Let continuous variables $X_1,\ldots,X_k$ be independently distributed according to the same probability density function $f(x)$. Prove that the density function for $\max\{X_1,\ldots,X_k\}$ is given by $kf(x)(F(x))^{k-1}$, where $F$ is the cumulative distribution for $f$.

Economists often make use of an exponential utility function for money:
$U(x) = -e^{-x/R}$, where $R$ is a positive constant representing an
individual’s risk tolerance. Risk tolerance reflects how likely an
individual is to accept a lottery with a particular expected monetary
value (EMV) versus some certain payoff. As $R$ (which is measured in the
same units as $x$) becomes larger, the individual becomes less
risk-averse.

1. Assume Mary has an exponential utility function with $R = \$500$.
Mary is given the choice between receiving $\$500$ with certainty
(probability 1) or participating in a lottery which has a 60%
probability of winning \$5000 and a 40% probability of
winning nothing. Assuming Marry acts rationally, which option would
she choose? Show how you derived your answer.

2. Consider the choice between receiving $\$100$ with certainty
(probability 1) or participating in a lottery which has a 50%
probability of winning $\$500$ and a 50% probability of winning
nothing. Approximate the value of R (to 3 significant digits) in an
exponential utility function that would cause an individual to be
indifferent to these two alternatives. (You might find it helpful to
write a short program to help you solve this problem.)

Economists often make use of an exponential utility function for money:
$U(x) = -e^{-x/R}$, where $R$ is a positive constant representing an
individual’s risk tolerance. Risk tolerance reflects how likely an
individual is to accept a lottery with a particular expected monetary
value (EMV) versus some certain payoff. As $R$ (which is measured in the
same units as $x$) becomes larger, the individual becomes less
risk-averse.

1. Assume Mary has an exponential utility function with $R = \$400$.
Mary is given the choice between receiving $\$400$ with certainty
(probability 1) or participating in a lottery which has a 60%
probability of winning \$5000 and a 40% probability of
winning nothing. Assuming Marry acts rationally, which option would
she choose? Show how you derived your answer.

2. Consider the choice between receiving $\$100$ with certainty
(probability 1) or participating in a lottery which has a 50%
probability of winning \$500 and a 50% probability of winning
nothing. Approximate the value of R (to 3 significant digits) in an
exponential utility function that would cause an individual to be
indifferent to these two alternatives. (You might find it helpful to
write a short program to help you solve this problem.)

Alex is given the choice between two games. In Game 1, a fair coin is
flipped and if it comes up heads, Alex receives $\$100$. If the coin comes
up tails, Alex receives nothing. In Game 2, a fair coin is flipped
twice. Each time the coin comes up heads, Alex receives $\$50$, and Alex
receives nothing for each coin flip that comes up tails. Assuming that
Alex has a monotonically increasing utility function for money in the
range \[\$0, \$100\], show mathematically that if Alex prefers Game 2 to
Game 1, then Alex is risk averse (at least with respect to this range of
monetary amounts).

Show that if $X_1$ and $X_2$ are preferentially independent of $X_3$,
and $X_2$ and $X_3$ are preferentially independent of $X_1$, then $X_3$
and $X_1$ are preferentially independent of $X_2$.

Repeat Exercise airport-id-exercise, using the action-utility representation shown in Figure airport-au-id-figure.

For either of the airport-siting diagrams from Exercises airport-id-exercise and airport-au-id-exercise, to which conditional probability table entry is the utility most sensitive, given the available evidence?

Modify and extend the Bayesian network code in the code repository to provide for creation and evaluation of decision networks and the calculation of information value.

Consider a student who has the choice to buy or not buy a textbook for a
course. We’ll model this as a decision problem with one Boolean decision
node, $B$, indicating whether the agent chooses to buy the book, and two
Boolean chance nodes, $M$, indicating whether the student has mastered
the material in the book, and $P$, indicating whether the student passes
the course. Of course, there is also a utility node, $U$. A certain
student, Sam, has an additive utility function: 0 for not buying the
book and -\$100 for buying it; and \$2000 for passing the course and 0
for not passing. Sam’s conditional probability estimates are as follows:
$$\begin{array}{ll}
P(p|b,m) = 0.9 & P(m|b) = 0.9 \\
P(p|b, \lnot m) = 0.5 & P(m|\lnot b) = 0.7 \\
P(p|\lnot b, m) = 0.8 & \\
P(p|\lnot b, \lnot m) = 0.3 & \\
\end{array}$$

You might think that $P$ would be independent of $B$ given
$M$, But this course has an open-book final—so having the book helps.

1. Draw the decision network for this problem.

2. Compute the expected utility of buying the book and of not
buying it.

3. What should Sam do?

This exercise completes the analysis of the
airport-siting problem in Figure airport-id-figure

.
1. Provide reasonable variable domains, probabilities, and utilities
for the network, assuming that there are three possible sites.

2. Solve the decision problem.

3. What happens if changes in technology mean that each aircraft
generates half the noise?

4. What if noise avoidance becomes three times more important?

5. Calculate the VPI for ${AirTraffic}$, ${Litigation}$, and
${Construction}$ in your model.

(Adapted from Pearl [Pearl:1988].) A used-car
buyer can decide to carry out various tests with various costs (e.g.,
kick the tires, take the car to a qualified mechanic) and then,
depending on the outcome of the tests, decide which car to buy. We will
assume that the buyer is deciding whether to buy car $c_1$, that there
is time to carry out at most one test, and that $t_1$ is the test of
$c_1$ and costs \$50.

A car can be in good shape (quality $q^+$) or bad shape (quality $q^-$),
and the tests might help indicate what shape the car is in. Car $c_1$
costs \$1,500, and its market value is $\$2,000$ if it is in good shape; if
not, $\$700$ in repairs will be needed to make it in good shape. The buyer’s
estimate is that $c_1$ has a 70% chance of being in good shape.

1. Draw the decision network that represents this problem.

2. Calculate the expected net gain from buying $c_1$, given no test.

3. Tests can be described by the probability that the car will pass or
fail the test given that the car is in good or bad shape. We have
the following information:

$P({pass}(c_1,t_1) | q^+(c_1)) = {0.8}$

$P({pass}(c_1,t_1) | q^-(c_1)) = {0.35}$

Use Bayes’ theorem to calculate the probability that the car will pass (or fail) its test and hence the probability that it is in good (or bad) shape given each possible test outcome.

4. Calculate the optimal decisions given either a pass or a fail, and
their expected utilities.

5. Calculate the value of information of the test, and derive an
optimal conditional plan for the buyer.

Recall the definition of *value of
information* in Section VPI-section.

1. Prove that the value of information is nonnegative and
order independent.

2. Explain why it is that some people would prefer not to get some
information—for example, not wanting to know the sex of their baby
when an ultrasound is done.

3. A function $f$ on sets is **submodular** if, for any element $x$ and any sets $A$
and $B$ such that $A\subseteq B$, adding $x$ to $A$ gives a greater
increase in $f$ than adding $x$ to $B$:
$$A\subseteq B \Rightarrow (f(A \cup \{x\}) - f(A)) \geq (f(B\cup \{x\}) - f(B))\ .$$
Submodularity captures the intuitive notion of *diminishing
returns*. Is the value of information, viewed as a function
$f$ on sets of possible observations, submodular? Prove this or find
a counterexample.

For the $4\times 3$ world shown in Figure sequential-decision-world-figure., calculate which squares can be reached from (1,1) by the action sequence $[{Up},{Up},{Right},{Right},{Right}]$ and with what probabilities. Explain how this computation is related to the prediction task (see Section general-filtering-section for a hidden Markov model.

For the $4\times 3$ world shown in Figure sequential-decision-world-figure, calculate which squares can be reached from (1,1) by the action sequence $[{Right},{Right},{Right},{Up},{Up}]$ and with what probabilities. Explain how this computation is related to the prediction task (see Section general-filtering-section) for a hidden Markov model.

Select a specific member of the set of policies that are optimal for
$R(s)>0$ as shown in
Figure sequential-decision-policies-figure(b), and
calculate the fraction of time the agent spends in each state, in the
limit, if the policy is executed forever. (*Hint*:
Construct the state-to-state transition probability matrix corresponding
to the policy and see
Exercise markov-convergence-exercise.)

Suppose that we define the utility of a state
sequence to be the *maximum* reward obtained in any state
in the sequence. Show that this utility function does not result in
stationary preferences between state sequences. Is it still possible to
define a utility function on states such that MEU decision making gives
optimal behavior?

Can any finite search problem be translated exactly into a Markov
decision problem such that an optimal solution of the latter is also an
optimal solution of the former? If so, explain *precisely*
how to translate the problem and how to translate the solution back; if
not, explain *precisely* why not (i.e., give a
counterexample).

Sometimes MDPs are formulated with a
reward function $R(s,a)$ that depends on the action taken or with a
reward function $R(s,a,s')$ that also depends on the outcome state.

1. Write the Bellman equations for these formulations.

2. Show how an MDP with reward function $R(s,a,s')$ can be transformed
into a different MDP with reward function $R(s,a)$, such that
optimal policies in the new MDP correspond exactly to optimal
policies in the original MDP.

3. Now do the same to convert MDPs with $R(s,a)$ into MDPs with $R(s)$.

For the environment shown in
Figure sequential-decision-world-figure, find all the
threshold values for $R(s)$ such that the optimal policy changes when
the threshold is crossed. You will need a way to calculate the optimal
policy and its value for fixed $R(s)$. (*Hint*: Prove that
the value of any fixed policy varies linearly with $R(s)$.)

Equation (vi-contraction-equation) on
page vi-contraction-equation states that the Bellman operator is a contraction.

1. Show that, for any functions $f$ and $g$,
$$|\max_a f(a) - \max_a g(a)| \leq \max_a |f(a) - g(a)|\ .$$

2. Write out an expression for $$|(B\,U_i - B\,U'_i)(s)|$$ and then apply
the result from (1) to complete the proof that the Bellman operator
is a contraction.

This exercise considers two-player MDPs that correspond to zero-sum,
turn-taking games like those in
Chapter game-playing-chapter. Let the players be $A$
and $B$, and let $R(s)$ be the reward for player $A$ in state $s$. (The
reward for $B$ is always equal and opposite.)

1. Let $U_A(s)$ be the utility of state $s$ when it is $A$’s turn to
move in $s$, and let $U_B(s)$ be the utility of state $s$ when it is
$B$’s turn to move in $s$. All rewards and utilities are calculated
from $A$’s point of view (just as in a minimax game tree). Write
down Bellman equations defining $U_A(s)$ and $U_B(s)$.

2. Explain how to do two-player value iteration with these equations,
and define a suitable termination criterion.

3. Consider the game described in
Figure line-game4-figure on page line-game4-figure.
Draw the state space (rather than the game tree), showing the moves
by $A$ as solid lines and moves by $B$ as dashed lines. Mark each
state with $R(s)$. You will find it helpful to arrange the states
$(s_A,s_B)$ on a two-dimensional grid, using $s_A$ and $s_B$ as
“coordinates.”

4. Now apply two-player value iteration to solve this game, and derive
the optimal policy.

Consider the $3 \times 3$ world shown in
Figure grid-mdp-figure(a). The transition model is the
same as in the $4\times 3$
Figure sequential-decision-world-figure: 80% of the
time the agent goes in the direction it selects; the rest of the time it
moves at right angles to the intended direction.

Implement value iteration for this world for each value of $r$ below.
Use discounted rewards with a discount factor of 0.99. Show the policy
obtained in each case. Explain intuitively why the value of $r$ leads to
each policy.

1. $r = -100$

2. $r = -3$

3. $r = 0$

4. $r = +3$

Consider the $101 \times 3$ world shown in
Figure grid-mdp-figure(b). In the start state the agent
has a choice of two deterministic actions, *Up* or
*Down*, but in the other states the agent has one
deterministic action, *Right*. Assuming a discounted reward
function, for what values of the discount $\gamma$ should the agent
choose *Up* and for which *Down*? Compute the
utility of each action as a function of $\gamma$. (Note that this simple
example actually reflects many real-world situations in which one must
weigh the value of an immediate action versus the potential continual
long-term consequences, such as choosing to dump pollutants into a
lake.)

Consider an undiscounted MDP having three states, (1, 2, 3), with
rewards $-1$, $-2$, $0$, respectively. State 3 is a terminal state. In
states 1 and 2 there are two possible actions: $a$ and $b$. The
transition model is as follows:

- In state 1, action $a$ moves the agent to state 2 with probability
0.8 and makes the agent stay put with probability 0.2.

- In state 2, action $a$ moves the agent to state 1 with probability
0.8 and makes the agent stay put with probability 0.2.

- In either state 1 or state 2, action $b$ moves the agent to state 3
with probability 0.1 and makes the agent stay put with
probability 0.9.

Answer the following questions:

1. What can be determined *qualitatively* about the
optimal policy in states 1 and 2?

2. Apply policy iteration, showing each step in full, to determine the
optimal policy and the values of states 1 and 2. Assume that the
initial policy has action $b$ in both states.

3. What happens to policy iteration if the initial policy has action
$a$ in both states? Does discounting help? Does the optimal policy
depend on the discount factor?

Consider the $4\times 3$ world shown in
Figure sequential-decision-world-figure

.
1. Implement an environment simulator for this environment, such that
the specific geography of the environment is easily altered. Some
code for doing this is already in the online code repository.

2. Create an agent that uses policy iteration, and measure its
performance in the environment simulator from various
starting states. Perform several experiments from each starting
state, and compare the average total reward received per run with
the utility of the state, as determined by your algorithm.

3. Experiment with increasing the size of the environment. How does the
run time for policy iteration vary with the size of the environment?

How can the value determination algorithm be used to calculate the expected loss experienced by an agent using a given set of utility estimates ${U}$ and an estimated model ${P}$, compared with an agent using correct values?

Let the initial belief state $b_0$ for the $4\times 3$ POMDP on page 4x3-pomdp-page be the uniform distribution over the nonterminal states, i.e., $< \frac{1}{9},\frac{1}{9},\frac{1}{9},\frac{1}{9},\frac{1}{9},\frac{1}{9},\frac{1}{9},\frac{1}{9},\frac{1}{9},0,0 >$. Calculate the exact belief state $b_1$ after the agent moves and its sensor reports 1 adjacent wall. Also calculate $b_2$ assuming that the same thing happens again.

What is the time complexity of $d$ steps of POMDP value iteration for a sensorless environment?

Consider a version of the two-state POMDP on page 2state-pomdp-page in which the sensor is 90% reliable in state 0 but provides no information in state 1 (that is, it reports 0 or 1 with equal probability). Analyze, either qualitatively or quantitatively, the utility function and the optimal policy for this problem.

Show that a dominant strategy equilibrium is a Nash equilibrium, but not vice versa.

In the children’s game of rock–paper–scissors each player reveals at the same time a choice of rock, paper, or scissors. Paper wraps rock, rock blunts scissors, and scissors cut paper. In the extended version rock–paper–scissors–fire–water, fire beats rock, paper, and scissors; rock, paper, and scissors beat water; and water beats fire. Write out the payoff matrix and find a mixed-strategy solution to this game.

Solve the game of *three*-finger Morra.

In the *Prisoner’s Dilemma*, consider the case where after
each round, Alice and Bob have probability $X$ meeting again. Suppose
both players choose the perpetual punishment strategy (where each will
choose ${refuse}$ unless the other player has ever played
${testify}$). Assume neither player has played ${testify}$ thus far.
What is the expected future total payoff for choosing to ${testify}$
versus ${refuse}$ when $X = .2$? How about when $X = .05$? For what
value of $X$ is the expected future total payoff the same whether one
chooses to ${testify}$ or ${refuse}$ in the current round?

The following payoff matrix, from @Blinder:1983 by way of Bernstein:1996, shows a game between
politicians and the Federal Reserve.

$$
\begin{array}
{|r|r|}\hline & Fed: contract & Fed: do nothing & Fed: expand \\
\hline
Pol: contract & F=7, P=1 & F=9, P=4 & F=6, P=6 \\
Pol: do nothing & F=8, P=2 & F=5, P=5 & F=4, P=9 \\
Pol: expand & F=3, P=3 & F=2, P=7 & F=1, P=8\\
\hline
\end{array}
$$

Politicians can expand or contract fiscal policy, while the Fed can
expand or contract monetary policy. (And of course either side can
choose to do nothing.) Each side also has preferences for who should do
what—neither side wants to look like the bad guys. The payoffs shown are
simply the rank orderings: 9 for first choice through 1 for last choice.
Find the Nash equilibrium of the game in pure strategies. Is this a
Pareto-optimal solution? You might wish to analyze the policies of
recent administrations in this light.

A Dutch auction is similar in an English auction, but rather than starting the bidding at a low price and increasing, in a Dutch auction the seller starts at a high price and gradually lowers the price until some buyer is willing to accept that price. (If multiple bidders accept the price, one is arbitrarily chosen as the winner.) More formally, the seller begins with a price $p$ and gradually lowers $p$ by increments of $d$ until at least one buyer accepts the price. Assuming all bidders act rationally, is it true that for arbitrarily small $d$, a Dutch auction will always result in the bidder with the highest value for the item obtaining the item? If so, show mathematically why. If not, explain how it may be possible for the bidder with highest value for the item not to obtain it.

Imagine an auction mechanism that is just like an ascending-bid auction, except that at the end, the winning bidder, the one who bid $b_{max}$, pays only $b_{max}/2$ rather than $b_{max}$. Assuming all agents are rational, what is the expected revenue to the auctioneer for this mechanism, compared with a standard ascending-bid auction?

Teams in the National Hockey League historically received 2 points for
winning a game and 0 for losing. If the game is tied, an overtime period
is played; if nobody wins in overtime, the game is a tie and each team
gets 1 point. But league officials felt that teams were playing too
conservatively in overtime (to avoid a loss), and it would be more
exciting if overtime produced a winner. So in 1999 the officials
experimented in mechanism design: the rules were changed, giving a team
that loses in overtime 1 point, not 0. It is still 2 points for a win
and 1 for a tie.

1. Was hockey a zero-sum game before the rule change? After?

2. Suppose that at a certain time $t$ in a game, the home team has
probability $p$ of winning in regulation time, probability $0.78-p$
of losing, and probability 0.22 of going into overtime, where they
have probability $q$ of winning, $.9-q$ of losing, and .1 of tying.
Give equations for the expected value for the home and
visiting teams.

3. Imagine that it were legal and ethical for the two teams to enter
into a pact where they agree that they will skate to a tie in
regulation time, and then both try in earnest to win in overtime.
Under what conditions, in terms of $p$ and $q$, would it be rational
for both teams to agree to this pact?

4. Longley+Sankaran:2005 report that since the rule change, the percentage of games with a
winner in overtime went up 18.2%, as desired, but the percentage of
overtime games also went up 3.6%. What does that suggest about
possible collusion or conservative play after the rule change?

Consider the problem faced by an infant learning to speak and understand a language. Explain how this process fits into the general learning model. Describe the percepts and actions of the infant, and the types of learning the infant must do. Describe the subfunctions the infant is trying to learn in terms of inputs and outputs, and available example data.

Repeat Exercise infant-language-exercise for the case of learning to play tennis (or some other sport with which you are familiar). Is this supervised learning or reinforcement learning?

Draw a decision tree for the problem of deciding whether to move forward at a road intersection, given that the light has just turned green.

We never test the same attribute twice along one path in a decision tree. Why not?

Suppose we generate a training set from a decision tree and then apply decision-tree learning to that training set. Is it the case that the learning algorithm will eventually return the correct tree as the training-set size goes to infinity? Why or why not?

In the recursive construction of
decision trees, it sometimes happens that a mixed set of positive and
negative examples remains at a leaf node, even after all the attributes
have been used. Suppose that we have $p$ positive examples and $n$
negative examples.

1. Show that the solution used by DECISION-TREE-LEARNING, which picks the majority
classification, minimizes the absolute error over the set of
examples at the leaf.

2. Show that the **class probability** $p/(p+n)$ minimizes the sum of squared errors.

Suppose that an attribute splits the set of examples $E$ into subsets $E_k$ and that each subset has $p_k$ positive examples and $n_k$ negative examples. Show that the attribute has strictly positive information gain unless the ratio $p_k/(p_k+n_k)$ is the same for all $k$.

Consider the following data set comprised of three binary input
attributes ($A_1, A_2$, and $A_3$) and one binary output:

$$
\begin{array}
{|r|r|}\hline \textbf{Example} & A_1 & A_2 & A_3 & Output\space y \\
\hline \textbf{x}_1 & 1 & 0 & 0 & 0 \\
\textbf{x}_2 & 1 & 0 & 1 & 0 \\
\textbf{x}_3 & 0 & 1 & 0 & 0 \\
\textbf{x}_4 & 1 & 1 & 1 & 1 \\
\textbf{x}_5 & 1 & 1 & 0 & 1 \\
\hline
\end{array}
$$
Use the algorithm in Figure DTL-algorithm
(page DTL-algorithm) to learn a decision tree for these data. Show the
computations made to determine the attribute to split at each node.

Construct a data set (set of examples with attributes and classifications) that would cause the decision-tree learning algorithm to find a non-minimal-sized tree. Show the tree constructed by the algorithm and the minimal-sized tree that you can generate by hand.

A decision *graph* is a generalization of a decision tree
that allows nodes (i.e., attributes used for splits) to have multiple
parents, rather than just a single parent. The resulting graph must
still be acyclic. Now, consider the XOR function of *three*
binary input attributes, which produces the value 1 if and only if an
odd number of the three input attributes has value 1.

1. Draw a minimal-sized decision *tree* for the
three-input XOR function.

2. Draw a minimal-sized decision *graph* for the
three-input XOR function.

This exercise considers $\chi^2$ pruning of
decision trees (Section chi-squared-section

.
1. Create a data set with two input attributes, such that the
information gain at the root of the tree for both attributes is
zero, but there is a decision tree of depth 2 that is consistent
with all the data. What would $\chi^2$ pruning do on this data set
if applied bottom up? If applied top down?

2. Modify DECISION-TREE-LEARNING to include $\chi^2$-pruning. You might wish to consult
Quinlan [Quinlan:1986] or [Kearns+Mansour:1998] for details.

The standard DECISION-TREE-LEARNING algorithm described in the
chapter does not handle cases in which some examples have missing
attribute values.

1. First, we need to find a way to classify such examples, given a
decision tree that includes tests on the attributes for which values
can be missing. Suppose that an example $\textbf{x}$ has a missing value for
attribute $A$ and that the decision tree tests for $A$ at a node
that $\textbf{x}$ reaches. One way to handle this case is to pretend that
the example has *all* possible values for the
attribute, but to weight each value according to its frequency among
all of the examples that reach that node in the decision tree. The
classification algorithm should follow all branches at any node for
which a value is missing and should multiply the weights along each
path. Write a modified classification algorithm for decision trees
that has this behavior.

2. Now modify the information-gain calculation so that in any given
collection of examples $C$ at a given node in the tree during the
construction process, the examples with missing values for any of
the remaining attributes are given “as-if” values according to the
frequencies of those values in the set $C$.

In
Section broadening-decision-tree-section, we noted that
attributes with many different possible values can cause problems with
the gain measure. Such attributes tend to split the examples into
numerous small classes or even singleton classes, thereby appearing to
be highly relevant according to the gain measure. The
**gain-ratio** criterion selects attributes
according to the ratio between their gain and their intrinsic
information content—that is, the amount of information contained in the
answer to the question, “What is the value of this attribute?” The
gain-ratio criterion therefore tries to measure how efficiently an
attribute provides information on the correct classification of an
example. Write a mathematical expression for the information content of
an attribute, and implement the gain ratio criterion in DECISION-TREE-LEARNING.

Suppose you are running a learning experiment on a new algorithm for Boolean classification. You have a data set consisting of 100 positive and 100 negative examples. You plan to use leave-one-out cross-validation and compare your algorithm to a baseline function, a simple majority classifier. (A majority classifier is given a set of training data and then always outputs the class that is in the majority in the training set, regardless of the input.) You expect the majority classifier to score about 50% on leave-one-out cross-validation, but to your surprise, it scores zero every time. Can you explain why?

Suppose that a learning algorithm is trying to find a consistent hypothesis when the classifications of examples are actually random. There are $n$ Boolean attributes, and examples are drawn uniformly from the set of $2^n$ possible examples. Calculate the number of examples required before the probability of finding a contradiction in the data reaches 0.5.

Construct a *decision list* to classify the data below.
Select tests to be as small as possible (in terms of attributes),
breaking ties among tests with the same number of attributes by
selecting the one that classifies the greatest number of examples
correctly. If multiple tests have the same number of attributes and
classify the same number of examples, then break the tie using
attributes with lower index numbers (e.g., select $A_1$ over $A_2$).

$$
\begin{array}
{|r|r|}\hline \textbf{Example} & A_1 & A_2 & A_3 & A_4 & y \\
\hline \textbf{x}_1 & 1 & 0 & 0 & 0 & 1 \\
\textbf{x}_2 & 1 & 0 & 1 & 1 & 1 \\
\textbf{x}_3 & 0 & 1 & 0 & 0 & 1 \\
\textbf{x}_4 & 0 & 1 & 1 & 0 & 0 \\
\textbf{x}_5 & 1 & 1 & 0 & 1 & 1 \\
\textbf{x}_6 & 0 & 1 & 0 & 1 & 0 \\
\textbf{x}_7 & 0 & 0 & 1 & 1 & 1 \\
\textbf{x}_8 & 0 & 0 & 1 & 0 & 0 \\
\hline
\end{array}
$$

Prove that a decision list can represent the same function as a decision tree while using at most as many rules as there are leaves in the decision tree for that function. Give an example of a function represented by a decision list using strictly fewer rules than the number of leaves in a minimal-sized decision tree for that same function.

This exercise concerns the expressiveness of
decision lists (Section learning-theory-section).

1. Show that decision lists can represent any Boolean function, if the
size of the tests is not limited.

2. Show that if the tests can contain at most $k$ literals each, then
decision lists can represent any function that can be represented by
a decision tree of depth $k$.

Suppose a $7$-nearest-neighbors regression search returns $ \{7, 6, 8, 4, 7, 11, 100\} $ as the 7 nearest $y$ values for a given $x$ value. What is the value of $\hat{y}$ that minimizes the $L_1$ loss function on this data? There is a common name in statistics for this value as a function of the $y$ values; what is it? Answer the same two questions for the $L_2$ loss function.

Suppose a $7$-nearest-neighbors regression search returns $ \{4, 2, 8, 4, 9, 11, 100\} $ as the 7 nearest $y$ values for a given $x$ value. What is the value of $\hat{y}$ that minimizes the $L_1$ loss function on this data? There is a common name in statistics for this value as a function of the $y$ values; what is it? Answer the same two questions for the $L_2$ loss function.

Figure kernel-machine-figure
showed how a circle at the origin can be linearly separated by mapping
from the features $(x_1, x_2)$ to the two dimensions $(x_1^2, x_2^2)$.
But what if the circle is not located at the origin? What if it is an
ellipse, not a circle? The general equation for a circle (and hence the
decision boundary) is $(x_1-a)^2 +
(x_2-b)^2 - r^20$, and the general equation for an ellipse is
$c(x_1-a)^2 + d(x_2-b)^2 - 1 0$.

1. Expand out the equation for the circle and show what the weights
$w_i$ would be for the decision boundary in the four-dimensional
feature space $(x_1, x_2, x_1^2, x_2^2)$. Explain why this means
that any circle is linearly separable in this space.

2. Do the same for ellipses in the five-dimensional feature space
$(x_1, x_2, x_1^2, x_2^2, x_1 x_2)$.

Construct a support vector machine that computes the xor function. Use values of +1 and –1 (instead of 1 and 0) for both inputs and outputs, so that an example looks like $([-1, 1], 1)$ or $([-1, -1], -1)$. Map the input $[x_1,x_2]$ into a space consisting of $x_1$ and $x_1\,x_2$. Draw the four input points in this space, and the maximal margin separator. What is the margin? Now draw the separating line back in the original Euclidean input space.

Consider an ensemble learning algorithm that
uses simple majority voting among $K$ learned hypotheses.
Suppose that each hypothesis has error $\epsilon$ and that the errors
made by each hypothesis are independent of the others’. Calculate a
formula for the error of the ensemble algorithm in terms of $K$
and $\epsilon$, and evaluate it for the cases where
$K=5$, 10, and 20 and $\epsilon={0.1}$, 0.2,
and 0.4. If the independence assumption is removed, is it possible for
the ensemble error to be *worse* than $\epsilon$?

Construct by hand a neural network that computes the xor function of two inputs. Make sure to specify what sort of units you are using.

A simple perceptron cannot represent xor (or, generally, the parity function of its inputs). Describe what happens to the weights of a four-input, hard-threshold perceptron, beginning with all weights set to 0.1, as examples of the parity function arrive.

Recall from Chapter concept-learning-chapter that there are $2^{2^n}$ distinct Boolean functions of $n$ inputs. How many of these are representable by a threshold perceptron?

Consider the following set of examples, each with six inputs and one
target output:

$$
\begin{array}
{|r|r|}\hline \textbf{Example} & A_1 & A_2 & A_3 & A_4 & A_5 & A_6 & A_7 & A_8 & A_9 & A_{10} & A_{11} & A_{12} & A_{13} & A_{14} \\
\hline
\textbf{x}_1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
\textbf{x}_2 & 0 & 0 & 0 & 1 & 1 & 0 & 0 & 1 & 1 & 0 & 1 & 0 & 1 & 1 \\
\textbf{x}_3 & 1 & 1 & 1 & 0 & 1 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 1 & 1 \\
\textbf{x}_4 & 0 & 1 & 0 & 0 & 1 & 0 & 0 & 1 & 0 & 1 & 1 & 1 & 0 & 1 \\
\textbf{x}_5 & 0 & 0 & 1 & 1 & 0 & 1 & 1 & 0 & 1 & 1 & 0 & 0 & 1 & 0 \\
\textbf{x}_6 & 0 & 0 & 0 & 1 & 0 & 1 & 0 & 1 & 1 & 0 & 1 & 1 & 1 & 0 \\
\textbf{T} & 1 & 1 & 1 & 1 & 1 & 1 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\
\hline
\end{array}
$$
1. Run the perceptron learning rule on these data and show the
final weights.

2. Run the decision tree learning rule, and show the resulting
decision tree.

3. Comment on your results.

Section logistic-regression-section
(page logistic-regression-section) noted that the output of the logistic function
could be interpreted as a *probability* $p$ assigned by the
model to the proposition that $f(\textbf{x})1$; the probability that
$f(\textbf{x})0$ is therefore $1-p$. Write down the probability $p$
as a function of $\textbf{x}$ and calculate the derivative of $\log p$ with
respect to each weight $w_i$. Repeat the process for $\log (1-p)$. These
calculations give a learning rule for minimizing the
negative-log-likelihood loss function for a probabilistic hypothesis.
Comment on any resemblance to other learning rules in the chapter.

Suppose you had a neural network with linear
activation functions. That is, for each unit the output is some constant
$c$ times the weighted sum of the inputs.

1. Assume that the network has one hidden layer. For a given assignment
to the weights $\textbf{w}$, write down equations for the value of the
units in the output layer as a function of $\textbf{w}$ and the input layer
$\textbf{x}$, without any explicit mention of the output of the
hidden layer. Show that there is a network with no hidden units that
computes the same function.

2. Repeat the calculation in part (a), but this time do it for a
network with any number of hidden layers.

3. Suppose a network with one hidden layer and linear activation
functions has $n$ input and output nodes and $h$ hidden nodes. What
effect does the transformation in part (a) to a network with no
hidden layers have on the total number of weights? Discuss in
particular the case $h \ll n$.

Implement a data structure for layered, feed-forward neural networks, remembering to provide the information needed for both forward evaluation and backward propagation. Using this data structure, write a function NEURAL-NETWORK-OUTPUT that takes an example and a network and computes the appropriate output values.

Suppose that a training set contains only a single example, repeated 100
times. In 80 of the 100 cases, the single output value is 1; in the
other 20, it is 0. What will a back-propagation network predict for this
example, assuming that it has been trained and reaches a global optimum?
(*Hint:* to find the global optimum, differentiate the
error function and set it to zero.)

The neural network whose learning performance is measured in Figure restaurant-back-prop-figure has four hidden nodes. This number was chosen somewhat arbitrarily. Use a cross-validation method to find the best number of hidden nodes.

Consider the problem of separating
$N$ data points into positive and negative examples using a linear
separator. Clearly, this can always be done for $N2$ points
on a line of dimension $d1$, regardless of how the points are
labeled or where they are located (unless the points are in the same
place).

1. Show that it can always be done for $N3$ points on a
plane of dimension $d2$, unless they are collinear.

2. Show that it cannot always be done for $N4$ points on a
plane of dimension $d2$.

3. Show that it can always be done for $N4$ points in a
space of dimension $d3$, unless they are coplanar.

4. Show that it cannot always be done for $N5$ points in a
space of dimension $d3$.

5. The ambitious student may wish to prove that $N$ points in general
position (but not $N+1$) are linearly separable in a space of
dimension $N-1$.

Show, by translating into conjunctive normal form and applying resolution, that the conclusion drawn on page dbsig-page concerning Brazilians is sound.

For each of the following determinations, write down the logical
representation and explain why the determination is true (if it is):

1. Design and denomination determine the mass of a coin.

2. For a given program, input determines output.

3. Climate, food intake, exercise, and metabolism determine weight gain
and loss.

4. Baldness is determined by the baldness (or lack thereof) of one’s
maternal grandfather.

For each of the following determinations, write down the logical
representation and explain why the determination is true (if it is):

1. Zip code determines the state (U.S.).

2. Design and denomination determine the mass of a coin.

3. Climate, food intake, exercise, and metabolism determine weight gain
and loss.

4. Baldness is determined by the baldness (or lack thereof) of one’s
maternal grandfather.

Would a probabilistic version of determinations be useful? Suggest a definition.

Fill in the missing values for the clauses $C_1$ or
$C_2$ (or both) in the following sets of clauses, given that $C$ is the
resolvent of $C_1$ and $C_2$:

1. $C = {True} \Rightarrow P(A,B)$,
$C_1 = P(x,y) \Rightarrow Q(x,y)$, $C_2
= ??$.

2. $C = {True} \Rightarrow P(A,B)$, $C_1 = ??$,
$C_2 = ??$.

3. $C = P(x,y) \Rightarrow P(x,f(y))$, $C_1 = ??$,
$C_2 = ??$.

If there is more than one possible solution, provide one example of each
different kind.

Suppose one writes a logic program that carries out a resolution inference step. That is, let ${Resolve}(c_1,c_2,c)$ succeed if $c$ is the result of resolving $c_1$ and $c_2$. Normally, ${Resolve}$ would be used as part of a theorem prover by calling it with $c_1$ and $c_2$ instantiated to particular clauses, thereby generating the resolvent $c$. Now suppose instead that we call it with $c$ instantiated and $c_1$ and $c_2$ uninstantiated. Will this succeed in generating the appropriate results of an inverse resolution step? Would you need any special modifications to the logic programming system for this to work?

Suppose that is considering adding a literal
to a clause using a binary predicate $P$ and that previous literals
(including the head of the clause) contain five different variables.

1. How many functionally different literals can be generated? Two
literals are functionally identical if they differ only in the names
of the *new* variables that they contain.

2. Can you find a general formula for the number of different literals
with a predicate of arity $r$ when there are $n$ variables
previously used?

3. Why does not allow literals that contain no previously used
variables?

Using the data from the family tree in Figure family2-figure, or a subset thereof, apply the algorithm to learn a definition for the ${Ancestor}$ predicate.

The data used for Figure bayes-candy-figure on page bayes-candy-figure can be viewed as being generated by $h_5$. For each of the other four hypotheses, generate a data set of length 100 and plot the corresponding graphs for $P(h_i|d_1,\ldots,d_N)$ and $P(D_{N+1}=lime|d_1,\ldots,d_N)$. Comment on your results.

Repeat Exercise bayes-candy-exercise, this time plotting the values of $P(D_{N+1}=lime|h_{MAP})$ and $P(D_{N+1}=lime|h_{ML})$.

Suppose that Ann’s utilities for cherry and lime candies are $c_A$ and $\ell_A$, whereas Bob’s utilities are $c_B$ and $\ell_B$. (But once Ann has unwrapped a piece of candy, Bob won’t buy it.) Presumably, if Bob likes lime candies much more than Ann, it would be wise for Ann to sell her bag of candies once she is sufficiently sure of its lime content. On the other hand, if Ann unwraps too many candies in the process, the bag will be worth less. Discuss the problem of determining the optimal point at which to sell the bag. Determine the expected utility of the optimal procedure, given the prior distribution from Section statistical-learning-section.

Two statisticians go to the doctor and are both given the same
prognosis: A 40% chance that the problem is the deadly disease $A$, and
a 60% chance of the fatal disease $B$. Fortunately, there are anti-$A$
and anti-$B$ drugs that are inexpensive, 100% effective, and free of
side-effects. The statisticians have the choice of taking one drug,
both, or neither. What will the first statistician (an avid Bayesian)
do? How about the second statistician, who always uses the maximum
likelihood hypothesis?

The doctor does some research and discovers that disease $B$ actually
comes in two versions, dextro-$B$ and levo-$B$, which are equally likely
and equally treatable by the anti-$B$ drug. Now that there are three
hypotheses, what will the two statisticians do?

Explain how to apply the boosting method of Chapter concept-learning-chapter to naive Bayes learning. Test the performance of the resulting algorithm on the restaurant learning problem.

Consider $N$ data points $(x_j,y_j)$, where the $y_j$s are generated from the $x_j$s according to the linear Gaussian model in Equation (linear-gaussian-likelihood-equation). Find the values of $\theta_1$, $\theta_2$, and $\sigma$ that maximize the conditional log likelihood of the data.

Consider the noisy-OR model for fever described
in Section canonical-distribution-section. Explain how
to apply maximum-likelihood learning to fit the parameters of such a
model to a set of complete data. (*Hint*: use the chain
rule for partial derivatives.)

This exercise investigates properties of
the Beta distribution defined in
Equation (beta-equation).

1. By integrating over the range $[0,1]$, show that the normalization
constant for the distribution $[a,b]$ is given by
$\alpha = \Gamma(a+b)/\Gamma(a)\Gamma(b)$ where $\Gamma(x)$ is the **Gamma function**,
defined by $\Gamma(x+1)x\cdot\Gamma(x)$ and
$\Gamma(1)1$. (For integer $x$,
$\Gamma(x+1)x!$.)

2. Show that the mean is $a/(a+b)$.

3. Find the mode(s) (the most likely value(s) of $\theta$).

4. Describe the distribution $[\epsilon,\epsilon]$ for very
small $\epsilon$. What happens as such a distribution is updated?

Consider an arbitrary Bayesian network, a complete data set for that network, and the likelihood for the data set according to the network. Give a simple proof that the likelihood of the data cannot decrease if we add a new link to the network and recompute the maximum-likelihood parameter values.

Consider a single Boolean random variable $Y$ (the “classification”).
Let the prior probability $P(Y=true)$ be $\pi$. Let’s try to
find $\pi$, given a training set $D=(y_1,\ldots,y_N)$ with $N$
independent samples of $Y$. Furthermore, suppose $p$ of the $N$ are
positive and $n$ of the $N$ are negative.

1. Write down an expression for the likelihood of $D$ (i.e., the
probability of seeing this particular sequence of examples, given a
fixed value of $\pi$) in terms of $\pi$, $p$, and $n$.

2. By differentiating the log likelihood $L$, find the value of $\pi$
that maximizes the likelihood.

3. Now suppose we add in $k$ Boolean random variables
$X_1, X_2,\ldots,X_k$ (the “attributes”) that describe each sample,
and suppose we assume that the attributes are conditionally
independent of each other given the goal $Y$. Draw the Bayes net
corresponding to this assumption.

4. Write down the likelihood for the data including the attributes,
using the following additional notation:

- $\alpha_i$ is $P(X_i=true \| Y=true)$.

- $\beta_i$ is $P(X_i=true \| Y=false)$.

- $p_i^+$ is the count of samples for which $X_i=true$
and $Y=true$.

- $n_i^+$ is the count of samples for which $X_i=false$
and $Y=true$.

- $p_i^-$ is the count of samples for which $X_i=true$
and $Y=false$.

- $n_i^-$ is the count of samples for which $X_i=false$
and $Y=false$.

\[*Hint*: consider first the probability of seeing a
single example with specified values for $X_1, X_2,\ldots,X_k$ and
$Y$.\]

5. By differentiating the log likelihood $L$, find the values of
$\alpha_i$ and $\beta_i$ (in terms of the various counts) that
maximize the likelihood and say in words what these
values represent.

6. Let $k = 2$, and consider a data set with 4 all four possible
examples of thexor function. Compute the maximum
likelihood estimates of $\pi$, $\alpha_1$, $\alpha_2$, $\beta_1$,
and $\beta_2$.

7. Given these estimates of $\pi$, $\alpha_1$, $\alpha_2$, $\beta_1$,
and $\beta_2$, what are the posterior probabilities
$P(Y=true | x_1,x_2)$ for each example?

Consider the application of EM to learn the parameters for the network
in Figure mixture-networks-figure(a), given the true
parameters in Equation (candy-true-equation).
1. Explain why the EM algorithm would not work if there were just two
attributes in the model rather than three.
2. Show the calculations for the first iteration of EM starting from
Equation (candy-64-equation).
3. What happens if we start with all the parameters set to the same
value $p$? (*Hint*: you may find it helpful to
investigate this empirically before deriving the general result.)
4. Write out an expression for the log likelihood of the tabulated
candy data on page candy-counts-page in terms of the parameters,
calculate the partial derivatives with respect to each parameter,
and investigate the nature of the fixed point reached in part (c).

Implement a passive learning agent in a simple environment, such as the $4\times 3$ world. For the case of an initially unknown environment model, compare the learning performance of the direct utility estimation, TD, and ADP algorithms. Do the comparison for the optimal policy and for several random policies. For which do the utility estimates converge faster? What happens when the size of the environment is increased? (Try environments with and without obstacles.)

Chapter complex-decisions-chapter defined a
**proper policy** for an MDP as one that is
guaranteed to reach a terminal state. Show that it is possible for a
passive ADP agent to learn a transition model for which its policy $\pi$
is improper even if $\pi$ is proper for the true MDP; with such models,
the POLICY-EVALUATION step may fail if $\gamma1$. Show that this problem cannot
arise if POLICY-EVALUATION is applied to the learned model only at the end of a trial.

Starting with the passive ADP agent,
modify it to use an approximate ADP algorithm as discussed in the text.
Do this in two steps:

1. Implement a priority queue for adjustments to the utility estimates.
Whenever a state is adjusted, all of its predecessors also become
candidates for adjustment and should be added to the queue. The
queue is initialized with the state from which the most recent
transition took place. Allow only a fixed number of adjustments.

2. Experiment with various heuristics for ordering the priority queue,
examining their effect on learning rates and computation time.

The direct utility estimation method in Section passive-rl-section uses distinguished terminal states to indicate the end of a trial. How could it be modified for environments with discounted rewards and no terminal states?

Write out the parameter update equations for TD learning with $$\hat{U}(x,y) = \theta_0 + \theta_1 x + \theta_2 y + \theta_3\,\sqrt{(x-x_g)^2 + (y-y_g)^2}\ .$$

Adapt the vacuum world (Chapter agents-chapter for reinforcement learning by including rewards for squares being clean. Make the world observable by providing suitable percepts. Now experiment with different reinforcement learning agents. Is function approximation necessary for success? What sort of approximator works for this application?

Implement an exploring reinforcement learning
agent that uses direct utility estimation. Make two versions—one with a
tabular representation and one using the function approximator in
Equation (4x3-linear-approx-equation). Compare their
performance in three environments:

1. The $4\times 3$ world described in the chapter.

2. A ${10}\times {10}$ world with no obstacles and a +1 reward
at (10,10).

3. A ${10}\times {10}$ world with no obstacles and a +1 reward
at (5,5).

Devise suitable features for reinforcement learning in stochastic grid worlds (generalizations of the $4\times 3$ world) that contain multiple obstacles and multiple terminal states with rewards of $+1$ or $-1$.

Extend the standard game-playing environment (Chapter game-playing-chapter) to incorporate a reward signal. Put two reinforcement learning agents into the environment (they may, of course, share the agent program) and have them play against each other. Apply the generalized TD update rule (Equation (generalized-td-equation)) to update the evaluation function. You might wish to start with a simple linear weighted evaluation function and a simple game, such as tic-tac-toe.

Compute the true utility function and the best linear
approximation in $x$ and $y$ (as in
Equation (4x3-linear-approx-equation)) for the
following environments:

1. A ${10}\times {10}$ world with a single $+1$ terminal state
at (10,10).

2. As in (a), but add a $-1$ terminal state at (10,1).

3. As in (b), but add obstacles in 10 randomly selected squares.

4. As in (b), but place a wall stretching from (5,2) to (5,9).

5. As in (a), but with the terminal state at (5,5).

The actions are deterministic moves in the four directions. In each
case, compare the results using three-dimensional plots. For each
environment, propose additional features (besides $x$ and $y$) that
would improve the approximation and show the results.

Implement the REINFORCE and PEGASUS algorithms and apply them to the $4\times 3$ world, using a policy family of your own choosing. Comment on the results.

Investigate the application of reinforcement learning ideas to the modeling of human and animal behavior.

Is reinforcement learning an appropriate abstract model for evolution? What connection exists, if any, between hardwired reward signals and evolutionary fitness?

This exercise explores the quality of the $n$-gram model of language. Find or create a monolingual corpus of 100,000 words or more. Segment it into words, and compute the frequency of each word. How many distinct words are there? Also count frequencies of bigrams (two consecutive words) and trigrams (three consecutive words). Now use those frequencies to generate language: from the unigram, bigram, and trigram models, in turn, generate a 100-word text by making random choices according to the frequency counts. Compare the three generated texts with actual language. Finally, calculate the perplexity of each model.

Write a program to do **segmentation** of
words without spaces. Given a string, such as the URL
“thelongestlistofthelongeststuffatthelongestdomainnameatlonglast.com,”
return a list of component words: [“the,” “longest,” “list,”
$\ldots$]. This task is useful for parsing URLs, for spelling
correction when words runtogether, and for languages such as Chinese
that do not have spaces between words. It can be solved with a unigram
or bigram word model and a dynamic programming algorithm similar to the
Viterbi algorithm.

*Zipf’s law* of word distribution states the following:
Take a large corpus of text, count the frequency of every word in the
corpus, and then rank these frequencies in decreasing order. Let $f_{I}$
be the $I$th largest frequency in this list; that is, $f_{1}$ is the
frequency of the most common word (usually “the”), $f_{2}$ is the
frequency of the second most common word, and so on. Zipf’s law states
that $f_{I}$ is approximately equal to $\alpha / I$ for some constant
$\alpha$. The law tends to be highly accurate except for very small and
very large values of $I$.

Choose a corpus of at least 20,000 words of online text, and verify Zipf’s law experimentally. Define an error measure and find the value of $\alpha$ where Zipf’s law best matches your experimental data. Create a log–log graph plotting $f_{I}$ vs. $I$ and $\alpha/I$ vs. $I$. (On a log–log graph, the function $\alpha/I$ is a straight line.) In carrying out the experiment, be sure to eliminate any formatting tokens (e.g., HTML tags) and normalize upper and lower case.

(Adapted from Jurafsky+Martin:2000.) In this exercise you will develop a classifier for
authorship: given a text, the classifier predicts which of two candidate
authors wrote the text. Obtain samples of text from two different
authors. Separate them into training and test sets. Now train a language
model on the training set. You can choose what features to use;
$n$-grams of words or letters are the easiest, but you can add
additional features that you think may help. Then compute the
probability of the text under each language model and chose the most
probable model. Assess the accuracy of this technique. How does accuracy
change as you alter the set of features? This subfield of linguistics is
called **stylometry**; its successes include the identification of the author of the
disputed *Federalist Papers* Mosteller+Wallace:1964 and
some disputed works of Shakespeare Hope:1994. Khmelev+Tweedie:2001 produce good results with
a simple letter bigram model.

This exercise concerns the classification of spam email. Create a corpus of spam email and one of non-spam mail. Examine each corpus and decide what features appear to be useful for classification: unigram words? bigrams? message length, sender, time of arrival? Then train a classification algorithm (decision tree, naive Bayes, SVM, logistic regression, or some other algorithm of your choosing) on a training set and report its accuracy on a test set.

Create a test set of ten queries, and pose them to three major Web search engines. Evaluate each one for precision at 1, 3, and 10 documents. Can you explain the differences between engines?

Try to ascertain which of the search engines from the previous exercise are using case folding, stemming, synonyms, and spelling correction.

Estimate how much storage space is necessary for the index to a 100 billion-page corpus of Web pages. Show the assumptions you made.

Write a regular expression or a short program to extract company names. Test it on a corpus of business news articles. Report your recall and precision.

Consider the problem of trying to evaluate the quality of an IR system
that returns a ranked list of answers (like most Web search engines).
The appropriate measure of quality depends on the presumed model of what
the searcher is trying to achieve, and what strategy she employs. For
each of the following models, propose a corresponding numeric measure.

1. The searcher will look at the first twenty answers returned, with
the objective of getting as much relevant information as possible.

2. The searcher needs only one relevant document, and will go down the
list until she finds the first one.

3. The searcher has a fairly narrow query and is able to examine all
the answers retrieved. She wants to be sure that she has seen
everything in the document collection that is relevant to her query.
(E.g., a lawyer wants to be sure that she has found
*all* relevant precedents, and is willing to spend
considerable resources on that.)

4. The searcher needs just one document relevant to the query, and can
afford to pay a research assistant for an hour’s work looking
through the results. The assistant can look through 100 retrieved
documents in an hour. The assistant will charge the searcher for the
full hour regardless of whether he finds it immediately or at the
end of the hour.

5. The searcher will look through all the answers. Examining a document
has cost \$ A; finding a relevant document has value \$ B; failing
to find a relevant document has cost \$ C for each relevant
document not found.

6. The searcher wants to collect as many relevant documents as
possible, but needs steady encouragement. She looks through the
documents in order. If the documents she has looked at so far are
mostly good, she will continue; otherwise, she will stop.

Read the following text once for
understanding, and remember as much of it as you can. There will be a
test later.

> The procedure is actually quite simple. First you arrange things into
different groups. Of course, one pile may be sufficient depending on how
much there is to do. If you have to go somewhere else due to lack of
facilities that is the next step, otherwise you are pretty well set. It
is important not to overdo things. That is, it is better to do too few
things at once than too many. In the short run this may not seem
important but complications can easily arise. A mistake is expensive as
well. At first the whole procedure will seem complicated. Soon, however,
it will become just another facet of life. It is difficult to foresee
any end to the necessity for this task in the immediate future, but then
one can never tell. After the procedure is completed one arranges the
material into different groups again. Then they can be put into their
appropriate places. Eventually they will be used once more and the whole
cycle will have to be repeated. However, this is part of life.

An *HMM grammar* is essentially a standard HMM whose state
variable is $N$ (nonterminal, with values such as $Det$, $Adjective$,
$Noun$ and so on) and whose evidence variable is $W$ (word, with values
such as $is$, $duck$, and so on). The HMM model includes a prior
${\textbf{P}}(N_0)$, a transition model
${\textbf{P}}(N_{t+1}|N_t)$, and a sensor model
${\textbf{P}}(W_t|N_t)$. Show that every HMM grammar can be
written as a PCFG. [Hint: start by thinking about how the HMM prior can
be represented by PCFG rules for the sentence symbol. You may find it
helpful to illustrate for the particular HMM with values $A$, $B$ for
$N$ and values $x$, $y$ for $W$.]

Consider the following PCFG for simple verb phrases:

> 0.1: VP $\rightarrow$ Verb

> 0.2: VP $\rightarrow$ Copula Adjective

> 0.5: VP $\rightarrow$ Verb the Noun

> 0.2: VP $\rightarrow$ VP Adverb

> 0.5: Verb $\rightarrow$ is

> 0.5: Verb $\rightarrow$ shoots

> 0.8: Copula $\rightarrow$ is

> 0.2: Copula $\rightarrow$ seems

> 0.5: Adjective $\rightarrow$ **unwell**

> 0.5: Adjective $\rightarrow$ **well**

> 0.5: Adverb $\rightarrow$ **well**

> 0.5: Adverb $\rightarrow$ **badly**

> 0.6: Noun $\rightarrow$ **duck**

> 0.4: Noun $\rightarrow$ **well**

1. Which of the following have a nonzero probability as a VP? (i)
shoots the duck well well well(ii) seems the well well(iii) shoots
the unwell well badly

2. What is the probability of generating “is well well”?

3. What types of ambiguity are exhibited by the phrase in (b)?

4. Given any PCFG, is it possible to calculate the probability that the
PCFG generates a string of exactly 10 words?

Consider the following simple PCFG for noun phrases:

> 0.6: NP $\rightarrow$ Det\ AdjString\ Noun

> 0.4: NP $\rightarrow$ Det\ NounNounCompound

> 0.5: AdjString $\rightarrow$ Adj\ AdjString

> 0.5: AdjString $\rightarrow$ $\Lambda$

> 1.0: NounNounCompound $\rightarrow$ Noun

> 0.8: Det $\rightarrow$ **the**

> 0.2: Det $\rightarrow$ **a**

> 0.5: Adj $\rightarrow$ **small**

> 0.5: Adj $\rightarrow$ **green**

> 0.6: Noun $\rightarrow$ **village**

> 0.4: Noun $\rightarrow$ **green**

where $\Lambda$ denotes the empty string.

1. What is the longest NP that can be generated by this grammar? (i)
three words(ii) four words(iii) infinitely many words

2. Which of the following have a nonzero probability of being generated
as complete NPs? (i) a small green village(ii) a green
green green(iii) a small village green

3. What is the probability of generating “the green green”?

4. What types of ambiguity are exhibited by the phrase in (c)?

5. Given any PCFG and any finite word sequence, is it possible to
calculate the probability that the sequence was generated by the
PCFG?

Outline the major differences between Java (or any other computer language with which you are familiar) and English, commenting on the “understanding” problem in each case. Think about such things as grammar, syntax, semantics, pragmatics, compositionality, context-dependence, lexical ambiguity, syntactic ambiguity, reference finding (including pronouns), background knowledge, and what it means to “understand” in the first place.

This exercise concerns grammars for very simple languages.

1. Write a context-free grammar for the language $a^n b^n$.

2. Write a context-free grammar for the palindrome language: the set of
all strings whose second half is the reverse of the first half.

3. Write a context-sensitive grammar for the duplicate language: the
set of all strings whose second half is the same as the first half.

Consider the sentence “Someone walked slowly to the supermarket” and a
lexicon consisting of the following words:

$Pronoun \rightarrow \textbf{someone} \quad Verb \rightarrow \textbf{walked}$

$Adv \rightarrow \textbf{slowly} \quad Prep \rightarrow \textbf{to}$

$Article \rightarrow \textbf{the} \quad Noun \rightarrow \textbf{supermarket}$

Which of the following three grammars, combined with the lexicon,
generates the given sentence? Show the corresponding parse tree(s).

$$
\quad\quad\quad\quad (A):\quad\quad\quad\quad \quad\quad\quad\quad(B):\quad\quad\quad\quad \quad\quad\quad\quad(C):\\
\quad\quad\quad\quad S \rightarrow NP \space VP \quad\quad\quad\quad \quad\quad\quad\quad S\rightarrow NP\space VP \quad\quad\quad\quad S\rightarrow NP\space VP\\
\quad\quad\quad\quad NP\rightarrow Pronoun \quad\quad\quad\quad NP\rightarrow Pronoun \quad\quad\quad\quad NP\rightarrow Pronoun\\
\quad\quad\quad\quad NP\rightarrow Article\space Noun \quad\quad\quad\quad NP\rightarrow Noun \quad\quad\quad\quad NP\rightarrow Article\space NP\\
\quad\quad\quad\quad VP\rightarrow VP\space PP \quad\quad\quad\quad NP\rightarrow Article\space NP \quad\quad\quad\quad VP\rightarrow Verb\space Adv\\
\quad\quad\quad\quad VP\rightarrow VP\space Adv\space Adv \quad\quad\quad\quad VP\rightarrow Verb\space Vmod \quad\quad\quad\quad Adv\rightarrow Adv\space Adv\\
\quad\quad\quad\quad VP\rightarrow Verb \quad\quad\quad\quad Vmod\rightarrow Adv\space Vmod \quad\quad\quad\quad Adv\rightarrow PP\\
\quad\quad\quad\quad PP\rightarrow Prep\space NP \quad\quad\quad\quad Vmod\rightarrow Adv \quad\quad\quad\quad PP\rightarrow Prep\space NP\\
\quad\quad\quad\quad NP\rightarrow Noun \quad\quad\quad\quad Adv\rightarrow PP \quad\quad\quad\quad NP\rightarrow Noun\\
\quad\quad\quad\quad\quad \quad\quad\quad\quad PP\rightarrow Prep\space NP \quad\quad\quad\quad \quad\quad\quad\quad
$$
For each of the preceding three grammars, write down three sentences of
English and three sentences of non-English generated by the grammar.
Each sentence should be significantly different, should be at least six
words long, and should include some new lexical entries (which you
should define). Suggest ways to improve each grammar to avoid generating
the non-English sentences.

Collect some examples of time expressions, such as “two o’clock,” “midnight,” and “12:46.” Also think up some examples that are ungrammatical, such as “thirteen o’clock” or “half past two fifteen.” Write a grammar for the time language.

Some linguists have argued as follows:

Children learning a language hear only *positive
examples* of the language and no *negative
examples*. Therefore, the hypothesis that “every possible
sentence is in the language” is consistent with all the observed
examples. Moreover, this is the simplest consistent hypothesis.
Furthermore, all grammars for languages that are supersets of the true
language are also consistent with the observed data. Yet children do
induce (more or less) the right grammar. It follows that they begin
with very strong innate grammatical constraints that rule out all of
these more general hypotheses *a priori*.

Comment on the weak point(s) in this argument from a statistical
learning viewpoint.

In this exercise you will transform $\large \varepsilon_0$ into
Chomsky Normal Form (CNF). There are five steps: (a) Add a new start
symbol, (b) Eliminate $\epsilon$ rules, (c) Eliminate multiple words on
right-hand sides, (d) Eliminate rules of the form
(${\it X} \rightarrow$${\it Y}$),
(e) Convert long right-hand sides into binary rules.

1. The start symbol, $S$, can occur only on the left-hand side in CNF.
Replace ${\it S}$ everywhere by a new symbol
${\it S'}$ and add a rule of the form
${\it S}$
$\rightarrow$${\it S'}$.

2. The empty string, $\epsilon$ cannot appear on the right-hand side
in CNF. $\large \varepsilon_0$ does not have any rules with $\epsilon$, so this is not
an issue.

3. A word can appear on the right-hand side in a rule only of the form
(${\it X}$
$\rightarrow$*word*).
Replace each rule of the form (${\it X}$
$\rightarrow$…*word* …)
with (${\it X}$
$\rightarrow$…${\it W'}$ …)
and (${\it W'}$
$\rightarrow$*word*),
using a new symbol ${\it W'}$.

4. A rule (${\it X}$
$\rightarrow$${\it Y}$)
is not allowed in CNF; it must be (${\it X}$
$\rightarrow$${\it Y}$
${\it Z}$) or (${\it X}$
$\rightarrow$*word*).
Replace each rule of the form (${\it X}$
$\rightarrow$${\it Y}$)
with a set of rules of the form (${\it X}$
$\rightarrow$…), one
for each rule (${\it Y}$
$\rightarrow$…),
where (…) indicates one or more symbols.

5. Replace each rule of the form (${\it X}$
$\rightarrow$${\it Y}$
${\it Z}$ …) with two rules, (${\it X}$
$\rightarrow$${\it Y}$
${\it Z'}$) and (${\it Z'}$
$\rightarrow$${\it Z}$
…), where ${\it Z'}$ is a new symbol.

Show each step of the process and the final set of rules.

Consider the following toy grammar:

> $S \rightarrow NP\space VP$

> $NP \rightarrow Noun$

> $NP \rightarrow NP\space and\space NP$

> $NP \rightarrow NP\space PP$

> $VP \rightarrow Verb$

> $VP \rightarrow VP\space and \space VP$

> $VP \rightarrow VP\space PP$

> $PP \rightarrow Prep\space NP$

> $Noun \rightarrow Sally\space; pools\space; streams\space; swims$

> $Prep \rightarrow in$

> $Verb \rightarrow pools\space; streams\space; swims$

1. Show all the parse trees in this grammar for the sentence “Sally
swims in streams and pools.”

2. Show all the table entries that would be made by
a (non-probabalistic) CYK parser on this sentence.

Using DCG notation, write a grammar for a language that is just like $\large \varepsilon_1$, except that it enforces agreement between the subject and verb of a sentence and thus does not generate ungrammatical sentences such as “I smells the wumpus.”

Consider the following PCFG:

> $S \rightarrow NP \space VP[1.0] $

> $NP \rightarrow \textit{Noun}[0.6] \space|\space \textit{Pronoun}[0.4] $

> $VP \rightarrow \textit{Verb} \space NP[0.8] \space|\space \textit{Modal}\space \textit{Verb}[0.2]$

> $\textit{Noun} \rightarrow \textbf{can}[0.1] \space|\space \textbf{fish}[0.3] \space|\space ...$

> $\textit{Pronoun} \rightarrow \textbf{I}[0.4] \space|\space ...$

> $\textit{Verb} \rightarrow \textbf{can}[0.01] \space|\space \textbf{fish}[0.1] \space|\space ...$

> $\textit{Modal} \rightarrow \textbf{can}[0.3] \space|\space ...$

The sentence “I can fish” has two parse trees with this grammar. Show
the two trees, their prior probabilities, and their conditional
probabilities, given the sentence.

An augmented context-free grammar can represent languages that a regular
context-free grammar cannot. Show an augmented context-free grammar for
the language $a^nb^nc^n$. The allowable values for augmentation
variables are 1 and $SUCCESSOR(n)$, where $n$ is a value. The rule for a sentence
in this language is

$$S(n) \rightarrow A(n) B(n) C(n) \ .$$
Show the rule(s) for each of ${\it A}$,
${\it B}$, and ${\it C}$.

Augment the $\large \varepsilon_1$ grammar so that it handles article–noun agreement. That is, make sure that “agents” and “an agent” are ${\it NP}$s, but “agent” and “an agents” are not.

Consider the following sentence (from *The New York Times,*
July 28, 2008):

> Banks struggling to recover from multibillion-dollar loans on real
> estate are curtailing loans to American businesses, depriving even
> healthy companies of money for expansion and hiring.
1. Which of the words in this sentence are lexically ambiguous?

2. Find two cases of syntactic ambiguity in this sentence (there are
more than two.)

3. Give an instance of metaphor in this sentence.

4. Can you find semantic ambiguity?

Without looking back at
Exercise washing-clothes-exercise, answer the following
questions:

1. What are the four steps that are mentioned?

2. What step is left out?

3. What is “the material” that is mentioned in the text?

4. What kind of mistake would be expensive?

5. Is it better to do too few things or too many? Why?

Select five sentences and submit them to an online translation service. Translate them from English to another language and back to English. Rate the resulting sentences for grammaticality and preservation of meaning. Repeat the process; does the second round of iteration give worse results or the same results? Does the choice of intermediate language make a difference to the quality of the results? If you know a foreign language, look at the translation of one paragraph into that language. Count and describe the errors made, and conjecture why these errors were made.

The $D_i$ values for the sentence in Figure mt-alignment-figure sum to 0. Will that be true of every translation pair? Prove it or give a counterexample.

(Adapted from [Knight:1999].) Our translation model assumes that, after the phrase
translation model selects phrases and the distortion model permutes
them, the language model can unscramble the permutation. This exercise
investigates how sensible that assumption is. Try to unscramble these
proposed lists of phrases into the correct order:

1. have, programming, a, seen, never, I, language, better

2. loves, john, mary

3. is the, communication, exchange of, intentional, information
brought, by, about, the production, perception of, and signs, from,
drawn, a, of, system, signs, conventional, shared

4. created, that, we hold these, to be, all men, truths, are, equal,
self-evident

Which ones could you do? What type of knowledge did you draw upon? Train
a bigram model from a training corpus, and use it to find the
highest-probability permutation of some sentences from a test corpus.
Report on the accuracy of this model.

Calculate the most probable path through the HMM in Figure sr-hmm-figure for the output sequence $[C_1,C_2,C_3,C_4,C_4,C_6,C_7]$. Also give its probability.

We forgot to mention that the text in Exercise washing-clothes-exercise is entitled “Washing Clothes.” Reread the text and answer the questions in Exercise washing-clothes2-exercise. Did you do better this time? Bransford and Johnson [Bransford+Johnson:1973] used this text in a controlled experiment and found that the title helped significantly. What does this tell you about how language and memory works?

In the shadow of a tree with a dense, leafy canopy, one sees a number of light spots. Surprisingly, they all appear to be circular. Why? After all, the gaps between the leaves through which the sun shines are not likely to be circular.

Consider a picture of a white sphere floating in front of a black backdrop. The image curve separating white pixels from black pixels is sometimes called the “outline” of the sphere. Show that the outline of a sphere, viewed in a perspective camera, can be an ellipse. Why do spheres not look like ellipses to you?

Consider an infinitely long cylinder of radius $r$ oriented with its axis along the $y$-axis. The cylinder has a Lambertian surface and is viewed by a camera along the positive $z$-axis. What will you expect to see in the image if the cylinder is illuminated by a point source at infinity located on the positive $x$-axis? Draw the contours of constant brightness in the projected image. Are the contours of equal brightness uniformly spaced?

Edges in an image can correspond to a variety of events in a scene. Consider Figure illuminationfigure (page illuminationfigure, and assume that it is a picture of a real three-dimensional scene. Identify ten different brightness edges in the image, and for each, state whether it corresponds to a discontinuity in (a) depth, (b) surface orientation, (c) reflectance, or (d) illumination.

A stereoscopic system is being contemplated for terrain mapping. It will
consist of two CCD cameras, each having ${512}\times {512}$ pixels on a
10 cm $\times$ 10 cm square sensor. The lenses to be used have a focal
length of 16 cm, with the focus fixed at infinity. For corresponding
points ($u_1,v_1$) in the left image and ($u_2,v_2$) in the right image,
$v_1=v_2$ because the $x$-axes in the two image planes are parallel to
the epipolar lines—the lines from the object to the camera. The optical
axes of the two cameras are parallel. The baseline between the cameras
is 1 meter.

1. If the nearest distance to be measured is 16 meters, what is the
largest disparity that will occur (in pixels)?

2. What is the distance resolution at 16 meters, due to the pixel
spacing?

3. What distance corresponds to a disparity of one pixel?

Which of the following are true, and which are false?

1. Finding corresponding points in stereo images is the easiest phase
of the stereo depth-finding process.

2. Shape-from-texture can be done by projecting a grid of light-stripes
onto the scene.

3. Lines with equal lengths in the scene always project to equal
lengths in the image.

4. Straight lines in the image necessarily correspond to straight lines
in the scene.

Which of the following are true, and which are false?

1. Finding corresponding points in stereo images is the easiest phase
of the stereo depth-finding process.

2. In stereo views of the same scene, greater accuracy is obtained in
the depth calculations if the two camera positions are
farther apart.

3. Lines with equal lengths in the scene always project to equal
lengths in the image.

4. Straight lines in the image necessarily correspond to straight lines
in the scene.

(Courtesy of Pietro Perona.) Figure bottle-figure shows two cameras at X and Y observing a scene. Draw the image seen at each camera, assuming that all named points are in the same horizontal plane. What can be concluded from these two images about the relative distances of points A, B, C, D, and E from the camera baseline, and on what basis?

Monte Carlo localization is
*biased* for any finite sample size—i.e., the expected
value of the location computed by the algorithm differs from the true
expected value—because of the way particle filtering works. In this
question, you are asked to quantify this bias.

To simplify, consider a world with four possible robot locations:
$X=\{x_1,x_2,x_3,x_4\}$. Initially, we
draw $N\geq $ samples uniformly from among those locations. As
usual, it is perfectly acceptable if more than one sample is generated
for any of the locations $X$. Let $Z$ be a Boolean sensor variable
characterized by the following conditional probabilities:

$$\begin{aligned}
P(z | x_1) = 0.8 \qquad\qquad P(z | x_1) = 0.2 \\
P(z | x_2) = 0.4 \qquad\qquad P(z | x_2) = 0.6 \\
P(z | x_3) = 0.1 \qquad\qquad P(z | x_3) = 0.9 \\
P(z | x_4) = 0.1 \qquad\qquad P(z | x_4) = 0.9
\end{aligned}$$

MCL uses these probabilities to generate particle weights, which are
subsequently normalized and used in the resampling process. For
simplicity, let us assume we generate only one new sample in the
resampling process, regardless of $N$. This sample might correspond to
any of the four locations in $X$. Thus, the sampling process defines a
probability distribution over $X$.

1. What is the resulting probability distribution over $X$ for this new
sample? Answer this question separately for
$N=1,\ldots,10$, and for $N=\infty$.

2. The difference between two probability distributions $P$ and $Q$ can
be measured by the KL divergence, which is defined as
$${KL}(P,Q) = \sum_i P(x_i)\log\frac{P(x_i)}{Q(x_i)}\ .$$ What are
the KL divergences between the distributions in (a) and the true
posterior?

3. What modification of the problem formulation (not the algorithm!)
would guarantee that the specific estimator above is unbiased even
for finite values of $N$? Provide at least two such modifications
(each of which should be sufficient).

Implement Monte Carlo localization for a simulated robot with range sensors. A grid map and range data are available from the code repository at aima.cs.berkeley.edu. You should demonstrate successful global localization of the robot.

Consider a robot with two simple manipulators, as shown in figure figRobot2. Manipulator A is a square block of side 2 which can slide back and on a rod that runs along the x-axis from x=$-$10 to x=10. Manipulator B is a square block of side 2 which can slide back and on a rod that runs along the y-axis from y=-10 to y=10. The rods lie outside the plane of manipulation, so the rods do not interfere with the movement of the blocks. A configuration is then a pair ${\langle}x,y{\rangle}$ where $x$ is the x-coordinate of the center of manipulator A and where $y$ is the y-coordinate of the center of manipulator B. Draw the configuration space for this robot, indicating the permitted and excluded zones.

Suppose that you are working with the robot in
Exercise AB-manipulator-ex and you are given the
problem of finding a path from the starting configuration of
figure figRobot2 to the ending configuration. Consider a potential
function $$D(A, {Goal})^2 + D(B, {Goal})^2 + \frac{1}{D(A, B)^2}$$
where $D(A,B)$ is the distance between the closest points of A and B.

1. Show that hill climbing in this potential field will get stuck in a
local minimum.

2. Describe a potential field where hill climbing will solve this
particular problem. You need not work out the exact numerical
coefficients needed, just the general form of the solution. (Hint:
Add a term that “rewards" the hill climber for moving A out of B’s
way, even in a case like this where this does not reduce the
distance from A to B in the above sense.)

Consider the robot arm shown in Figure FigArm1. Assume that the robot’s base element is 60cm long and that its upper arm and forearm are each 40cm long. As argued on page inverse-kinematics-not-unique, the inverse kinematics of a robot is often not unique. State an explicit closed-form solution of the inverse kinematics for this arm. Under what exact conditions is the solution unique?

Consider the robot arm shown in Figure FigArm1. Assume that the robot’s base element is 70cm long and that its upper arm and forearm are each 50cm long. As argued on page inverse-kinematics-not-unique, the inverse kinematics of a robot is often not unique. State an explicit closed-form solution of the inverse kinematics for this arm. Under what exact conditions is the solution unique?

Implement an algorithm for calculating the Voronoi diagram of an arbitrary 2D environment, described by an $n\times n$ Boolean array. Illustrate your algorithm by plotting the Voronoi diagram for 10 interesting maps. What is the complexity of your algorithm?

This exercise explores the relationship between
workspace and configuration space using the examples shown in
Figure FigEx2.
1. Consider the robot configurations shown in
Figure FigEx2(a) through (c), ignoring the obstacle
shown in each of the diagrams. Draw the corresponding arm
configurations in configuration space. (*Hint:* Each
arm configuration maps to a single point in configuration space, as
illustrated in Figure FigArm1(b).)

2. Draw the configuration space for each of the workspace diagrams in
Figure FigEx2(a)–(c). (*Hint:* The
configuration spaces share with the one shown in
Figure FigEx2(a) the region that corresponds to
self-collision, but differences arise from the lack of enclosing
obstacles and the different locations of the obstacles in these
individual figures.)

3. For each of the black dots in Figure FigEx2(e)–(f),
draw the corresponding configurations of the robot arm in workspace.
Please ignore the shaded regions in this exercise.

4. The configuration spaces shown in
Figure FigEx2(e)–(f) have all been generated by a
single workspace obstacle (dark shading), plus the constraints
arising from the self-collision constraint (light shading). Draw,
for each diagram, the workspace obstacle that corresponds to the
darkly shaded area.

5. Figure FigEx2(d) illustrates that a single planar
obstacle can decompose the workspace into two disconnected regions.
What is the maximum number of disconnected regions that can be
created by inserting a planar obstacle into an obstacle-free,
connected workspace, for a 2DOF robot? Give an example, and argue
why no larger number of disconnected regions can be created. How
about a non-planar obstacle?

Consider a mobile robot moving on a horizontal surface. Suppose that the
robot can execute two kinds of motions:

- Rolling forward a specified distance.

- Rotating in place through a specified angle.

The state of such a robot can be characterized in terms of three
parameters ${\langle}x,y,\phi$, the x-coordinate and y-coordinate of the
robot (more precisely, of its center of rotation) and the robot’s
orientation expressed as the angle from the positive x direction. The
action “$Roll(D)$” has the effect of changing state ${\langle}x,y,\phi$
to ${\langle}x+D \cos(\phi), y+D \sin(\phi), \phi {\rangle}$, and the
action $Rotate(\theta)$ has the effect of changing state

${\langle}x,y,\phi {\rangle}$ to
${\langle}x,y, \phi + \theta {\rangle}$.
1. Suppose that the robot is initially at ${\langle}0,0,0 {\rangle}$
and then executes the actions $Rotate(60^{\circ})$, $Roll(1)$,
$Rotate(25^{\circ})$, $Roll(2)$. What is the final state of the
robot?

2. Now suppose that the robot has imperfect control of its own
rotation, and that, if it attempts to rotate by $\theta$, it may
actually rotate by any angle between $\theta-10^{\circ}$ and
$\theta+10^{\circ}$. In that case, if the robot attempts to carry
out the sequence of actions in (A), there is a range of possible
ending states. What are the minimal and maximal values of the
x-coordinate, the y-coordinate and the orientation in the final
state?

3. Let us modify the model in (B) to a probabilistic model in which,
when the robot attempts to rotate by $\theta$, its actual angle of
rotation follows a Gaussian distribution with mean $\theta$ and
standard deviation $10^{\circ}$. Suppose that the robot executes the
actions $Rotate(90^{\circ})$, $Roll(1)$. Give a simple argument
that (a) the expected value of the location at the end is not equal
to the result of rotating exactly $90^{\circ}$ and then rolling
forward 1 unit, and (b) that the distribution of locations at the
end does not follow a Gaussian. (Do not attempt to calculate the
true mean or the true distribution.)

The point of this exercise is that rotational uncertainty quickly
gives rise to a lot of positional uncertainty and that dealing with
rotational uncertainty is painful, whether uncertainty is treated in
terms of hard intervals or probabilistically, due to the fact that
the relation between orientation and position is both non-linear
and non-monotonic.

Consider the simplified robot shown in
Figure FigEx3. Suppose the robot’s Cartesian
coordinates are known at all times, as are those of its goal location.
However, the locations of the obstacles are unknown. The robot can sense
obstacles in its immediate proximity, as illustrated in this figure. For
simplicity, let us assume the robot’s motion is noise-free, and the
state space is discrete. Figure FigEx3 is only one
example; in this exercise you are required to address all possible grid
worlds with a valid path from the start to the goal location.

1. Design a deliberate controller that guarantees that the robot always
reaches its goal location if at all possible. The deliberate
controller can memorize measurements in the form of a map that is
being acquired as the robot moves. Between individual moves, it may
spend arbitrary time deliberating.

2. Now design a *reactive* controller for the same task.
This controller may not memorize past sensor measurements. (It may
not build a map!) Instead, it has to make all decisions based on the
current measurement, which includes knowledge of its own location
and that of the goal. The time to make a decision must be
independent of the environment size or the number of past
time steps. What is the maximum number of steps that it may take for
your robot to arrive at the goal?

3. How will your controllers from (a) and (b) perform if any of the
following six conditions apply: continuous state space, noise in
perception, noise in motion, noise in both perception and motion,
unknown location of the goal (the goal can be detected only when
within sensor range), or moving obstacles. For each condition and
each controller, give an example of a situation where the robot
fails (or explain why it cannot fail).

In Figure Fig5(b) on page Fig5, we encountered an augmented finite state machine for the control of a single leg of a hexapod robot. In this exercise, the aim is to design an AFSM that, when combined with six copies of the individual leg controllers, results in efficient, stable locomotion. For this purpose, you have to augment the individual leg controller to pass messages to your new AFSM and to wait until other messages arrive. Argue why your controller is efficient, in that it does not unnecessarily waste energy (e.g., by sliding legs), and in that it propels the robot at reasonably high speeds. Prove that your controller satisfies the dynamic stability condition given on page polygon-stability-condition-page.

(This exercise was first devised by Michael
Genesereth and Nils Nilsson. It works for first graders through graduate
students.) Humans are so adept at basic household tasks that they often
forget how complex these tasks are. In this exercise you will discover
the complexity and recapitulate the last 30 years of developments in
robotics. Consider the task of building an arch out of three blocks.
Simulate a robot with four humans as follows:

**Brain.** The Brain direct the hands in the execution of a
plan to achieve the goal. The Brain receives input from the Eyes, but
*cannot see the scene directly*. The brain is the only one
who knows what the goal is.

**Eyes.** The Eyes report a brief description of the scene
to the Brain: “There is a red box standing on top of a green box, which
is on its side” Eyes can also answer questions from the Brain such as,
“Is there a gap between the Left Hand and the red box?” If you have a
video camera, point it at the scene and allow the eyes to look at the
viewfinder of the video camera, but not directly at the scene.

**Left hand** and **right hand.** One person
plays each Hand. The two Hands stand next to each other, each wearing an
oven mitt on one hand, Hands execute only simple commands from the
Brain—for example, “Left Hand, move two inches forward.” They cannot
execute commands other than motions; for example, they cannot be
commanded to “Pick up the box.” The Hands must be
*blindfolded*. The only sensory capability they have is the
ability to tell when their path is blocked by an immovable obstacle such
as a table or the other Hand. In such cases, they can beep to inform the
Brain of the difficulty.

Go through Turing’s list of alleged “disabilities” of machines, identifying which have been achieved, which are achievable in principle by a program, and which are still problematic because they require conscious mental states.

Find and analyze an account in the popular media of one or more of the arguments to the effect that AI is impossible.

Attempt to write definitions of the terms “intelligence,” “thinking,” and “consciousness.” Suggest some possible objections to your definitions.

Does a refutation of the Chinese room argument necessarily prove that appropriately programmed computers have mental states? Does an acceptance of the argument necessarily mean that computers cannot have mental states?

In the brain replacement argument, it is important to be able to restore the subject’s brain to normal, such that its external behavior is as it would have been if the operation had not taken place. Can the skeptic reasonably object that this would require updating those neurophysiological properties of the neurons relating to conscious experience, as distinct from those involved in the functional behavior of the neurons?

Suppose that a Prolog program containing many clauses about the rules of British citizenship is compiled and run on an ordinary computer. Analyze the “brain states” of the computer under wide and narrow content.

Alan Perlis [Perlis:1982] wrote, “A year spent in artificial intelligence is enough to make one believe in God”. He also wrote, in a letter to Philip Davis, that one of the central dreams of computer science is that “through the performance of computers and their programs we will remove all doubt that there is only a chemical distinction between the living and nonliving world.” To what extent does the progress made so far in artificial intelligence shed light on these issues? Suppose that at some future date, the AI endeavor has been completely successful; that is, we have build intelligent agents capable of carrying out any human cognitive task at human levels of ability. To what extent would that shed light on these issues?

Compare the social impact of artificial intelligence in the last fifty years with the social impact of the introduction of electric appliances and the internal combustion engine in the fifty years between 1890 and 1940.

I. J. Good claims that intelligence is the most important quality, and that building ultraintelligent machines will change everything. A sentient cheetah counters that “Actually speed is more important; if we could build ultrafast machines, that would change everything,” and a sentient elephant claims “You’re both wrong; what we need is ultrastrong machines.” What do you think of these arguments?

Analyze the potential threats from AI technology to society. What threats are most serious, and how might they be combated? How do they compare to the potential benefits?

How do the potential threats from AI technology compare with those from other computer science technologies, and to bio-, nano-, and nuclear technologies?

Some critics object that AI is impossible, while others object that it
is *too* possible and that ultraintelligent machines pose a
threat. Which of these objections do you think is more likely? Would it
be a contradiction for someone to hold both positions?