5. Adversarial Search
Suppose you have an oracle, $OM(s)$, that correctly predicts the opponent’s move in any state. Using this, formulate the definition of a game as a (singleagent) search problem. Describe an algorithm for finding the optimal move.
Consider the problem of solving two 8puzzles.

Give a complete problem formulation in the style of Chapter searchchapter.

How large is the reachable state space? Give an exact numerical expression.

Suppose we make the problem adversarial as follows: the two players take turns moving; a coin is flipped to determine the puzzle on which to make a move in that turn; and the winner is the first to solve one puzzle. Which algorithm can be used to choose a move in this setting?

Does the game eventually end, given optimal play? Explain.
Imagine that, in Exercise [twofriendsexercise], one of the friends wants to avoid the other. The problem then becomes a twoplayer game. We assume now that the players take turns moving. The game ends only when the players are on the same node; the terminal payoff to the pursuer is minus the total time taken. (The evader “wins” by never losing.) An example is shown in Figure pursuitevasiongamefigure.

Copy the game tree and mark the values of the terminal nodes.

Next to each internal node, write the strongest fact you can infer about its value (a number, one or more inequalities such as “$\geq 14$”, or a “?”).

Beneath each question mark, write the name of the node reached by that branch.

Explain how a bound on the value of the nodes in (c) can be derived from consideration of shortestpath lengths on the map, and derive such bounds for these nodes. Remember the cost to get to each leaf as well as the cost to solve it.

Now suppose that the tree as given, with the leaf bounds from (d), is evaluated from left to right. Circle those “?” nodes that would not need to be expanded further, given the bounds from part (d), and cross out those that need not be considered at all.

Can you prove anything in general about who wins the game on a map that is a tree?
Exercise 5.4 [gameplayingchanceexercise]
Describe and implement state descriptions, move generators, terminal tests, utility functions, and evaluation functions for one or more of the following stochastic games: Monopoly, Scrabble, bridge play with a given contract, or Texas hold’em poker.
Describe and implement a realtime, multiplayer gameplaying environment, where time is part of the environment state and players are given fixed time allocations.
Discuss how well the standard approach to game playing would apply to games such as tennis, pool, and croquet, which take place in a continuous physical state space.
Exercise 5.7 [minimaxoptimalityexercise]
Prove the following assertion: For every game tree, the utility obtained by max using minimax decisions against a suboptimal min will never be lower than the utility obtained playing against an optimal min. Can you come up with a game tree in which max can do still better using a suboptimal strategy against a suboptimal min?
Player $A$ moves first. The two players take turns moving, and each player must move his token to an open adjacent space in either direction. If the opponent occupies an adjacent space, then a player may jump over the opponent to the next open space if any. (For example, if $A$ is on 3 and $B$ is on 2, then $A$ may move back to 1.) The game ends when one player reaches the opposite end of the board. If player $A$ reaches space 4 first, then the value of the game to $A$ is $+1$; if player $B$ reaches space 1 first, then the value of the game to $A$ is $1$.
Consider the twoplayer game described in Figure linegame4figure.

Draw the complete game tree, using the following conventions:

Write each state as $(s_A,s_B)$, where $s_A$ and $s_B$ denote the token locations.

Put each terminal state in a square box and write its game value in a circle.

Put loop states (states that already appear on the path to the root) in double square boxes. Since their value is unclear, annotate each with a “?” in a circle.


Now mark each node with its backedup minimax value (also in a circle). Explain how you handled the “?” values and why.

Explain why the standard minimax algorithm would fail on this game tree and briefly sketch how you might fix it, drawing on your answer to (b). Does your modified algorithm give optimal decisions for all games with loops?

This 4square game can be generalized to $n$ squares for any $n > 2$. Prove that $A$ wins if $n$ is even and loses if $n$ is odd.
This problem exercises the basic concepts of game playing, using tictactoe (noughts and crosses) as an example. We define $X_n$ as the number of rows, columns, or diagonals with exactly $n$ $X$’s and no $O$’s. Similarly, $O_n$ is the number of rows, columns, or diagonals with just $n$ $O$’s. The utility function assigns $+1$ to any position with $X_3=1$ and $1$ to any position with $O_3 = 1$. All other terminal positions have utility 0. For nonterminal positions, we use a linear evaluation function defined as ${Eval}(s) = 3X_2(s) + X_1(s)  (3O_2(s) + O_1(s))$.

Approximately how many possible games of tictactoe are there?

Show the whole game tree starting from an empty board down to depth 2 (i.e., one $X$ and one $O$ on the board), taking symmetry into account.

Mark on your tree the evaluations of all the positions at depth 2.

Using the minimax algorithm, mark on your tree the backedup values for the positions at depths 1 and 0, and use those values to choose the best starting move.

Circle the nodes at depth 2 that would not be evaluated if alpha–beta pruning were applied, assuming the nodes are generated in the optimal order for alpha–beta pruning.
Consider the family of generalized tictactoe games, defined as follows. Each particular game is specified by a set $\mathcal S$ of squares and a collection $\mathcal W$ of winning positions. Each winning position is a subset of $\mathcal S$. For example, in standard tictactoe, $\mathcal S$ is a set of 9 squares and $\mathcal W$ is a collection of 8 subsets of $\cal W$: the three rows, the three columns, and the two diagonals. In other respects, the game is identical to standard tictactoe. Starting from an empty board, players alternate placing their marks on an empty square. A player who marks every square in a winning position wins the game. It is a tie if all squares are marked and neither player has won.

Let $N= {\mathcal S}$, the number of squares. Give an upper bound on the number of nodes in the complete game tree for generalized tictactoe as a function of $N$.

Give a lower bound on the size of the game tree for the worst case, where ${\mathcal W} = {{\,}}$.

Propose a plausible evaluation function that can be used for any instance of generalized tictactoe. The function may depend on $\mathcal S$ and $\mathcal W$.

Assume that it is possible to generate a new board and check whether it is a winning position in 100$N$ machine instructions and assume a 2 gigahertz processor. Ignore memory limitations. Using your estimate in (a), roughly how large a game tree can be completely solved by alpha–beta in a second of CPU time? a minute? an hour?
Develop a general gameplaying program, capable of playing a variety of games.

Implement move generators and evaluation functions for one or more of the following games: Kalah, Othello, checkers, and chess.

Construct a general alpha–beta gameplaying agent.

Compare the effect of increasing search depth, improving move ordering, and improving the evaluation function. How close does your effective branching factor come to the ideal case of perfect move ordering?

Implement a selective search algorithm, such as B* @Berliner:1979, conspiracy number search @McAllester:1988, or MGSS* @Russell+Wefald:1989 and compare its performance to A*.
Describe how the minimax and alpha–beta algorithms change for twoplayer, nonzerosum games in which each player has a distinct utility function and both utility functions are known to both players. If there are no constraints on the two terminal utilities, is it possible for any node to be pruned by alpha–beta? What if the player’s utility functions on any state differ by at most a constant $k$, making the game almost cooperative?
Describe how the minimax and alpha–beta algorithms change for twoplayer, nonzerosum games in which each player has a distinct utility function and both utility functions are known to both players. If there are no constraints on the two terminal utilities, is it possible for any node to be pruned by alpha–beta? What if the player’s utility functions on any state sum to a number between constants $k$ and $k$, making the game almost zerosum?
Develop a formal proof of correctness for alpha–beta pruning. To do this, consider the situation shown in Figure alphabetaprooffigure. The question is whether to prune node $n_j$, which is a maxnode and a descendant of node $n_1$. The basic idea is to prune it if and only if the minimax value of $n_1$ can be shown to be independent of the value of $n_j$.

Mode $n_1$ takes on the minimum value among its children: $n_1 = \min(n_2,n_21,\ldots,n_{2b_2})$. Find a similar expression for $n_2$ and hence an expression for $n_1$ in terms of $n_j$.

Let $l_i$ be the minimum (or maximum) value of the nodes to the left of node $n_i$ at depth $i$, whose minimax value is already known. Similarly, let $r_i$ be the minimum (or maximum) value of the unexplored nodes to the right of $n_i$ at depth $i$. Rewrite your expression for $n_1$ in terms of the $l_i$ and $r_i$ values.

Now reformulate the expression to show that in order to affect $n_1$, $n_j$ must not exceed a certain bound derived from the $l_i$ values.

Repeat the process for the case where $n_j$ is a minnode.
Prove that the alpha–beta algorithm takes time $O(b^{m/2})$ with optimal move ordering, where $m$ is the maximum depth of the game tree.
Suppose you have a chess program that can evaluate 5 million nodes per second. Decide on a compact representation of a game state for storage in a transposition table. About how many entries can you fit in a 1gigabyte inmemory table? Will that be enough for the three minutes of search allocated for one move? How many table lookups can you do in the time it would take to do one evaluation? Now suppose the transposition table is stored on disk. About how many evaluations could you do in the time it takes to do one disk seek with standard disk hardware?
Suppose you have a chess program that can evaluate 10 million nodes per second. Decide on a compact representation of a game state for storage in a transposition table. About how many entries can you fit in a 2gigabyte inmemory table? Will that be enough for the three minutes of search allocated for one move? How many table lookups can you do in the time it would take to do one evaluation? Now suppose the transposition table is stored on disk. About how many evaluations could you do in the time it takes to do one disk seek with standard disk hardware?
This question considers pruning in games with chance nodes. Figure trivialchancegamefigure shows the complete game tree for a trivial game. Assume that the leaf nodes are to be evaluated in lefttoright order, and that before a leaf node is evaluated, we know nothing about its value—the range of possible values is $\infty$ to $\infty$.

Copy the figure, mark the value of all the internal nodes, and indicate the best move at the root with an arrow.

Given the values of the first six leaves, do we need to evaluate the seventh and eighth leaves? Given the values of the first seven leaves, do we need to evaluate the eighth leaf? Explain your answers.

Suppose the leaf node values are known to lie between –2 and 2 inclusive. After the first two leaves are evaluated, what is the value range for the lefthand chance node?

Circle all the leaves that need not be evaluated under the assumption in (c).
Implement the expectiminimax algorithm and the *alpha–beta algorithm, which is described by @Ballard:1983, for pruning game trees with chance nodes. Try them on a game such as backgammon and measure the pruning effectiveness of *alpha–beta.
Exercise 5.20 [gamelineartransform]
Prove that with a positive linear transformation of leaf values (i.e., transforming a value $x$ to $ax + b$ where $a > 0$), the choice of move remains unchanged in a game tree, even when there are chance nodes.
Exercise 5.21 [gameplayingmontecarloexercise]
Consider the following procedure for choosing moves in games with chance nodes:

Generate some diceroll sequences (say, 50) down to a suitable depth (say, 8).

With known dice rolls, the game tree becomes deterministic. For each diceroll sequence, solve the resulting deterministic game tree using alpha–beta.

Use the results to estimate the value of each move and to choose the best.
Will this procedure work well? Why (or why not)?
In the following, a “max” tree consists only of max nodes, whereas an “expectimax” tree consists of a max node at the root with alternating layers of chance and max nodes. At chance nodes, all outcome probabilities are nonzero. The goal is to find the value of the root with a boundeddepth search. For each of (a)–(f), either give an example or explain why this is impossible.

Assuming that leaf values are finite but unbounded, is pruning (as in alpha–beta) ever possible in a max tree?

Is pruning ever possible in an expectimax tree under the same conditions?

If leaf values are all nonnegative, is pruning ever possible in a max tree? Give an example, or explain why not.

If leaf values are all nonnegative, is pruning ever possible in an expectimax tree? Give an example, or explain why not.

If leaf values are all in the range $[0,1]$, is pruning ever possible in a max tree? Give an example, or explain why not.

If leaf values are all in the range $[0,1]$, is pruning ever possible in an expectimax tree?

Consider the outcomes of a chance node in an expectimax tree. Which of the following evaluation orders is most likely to yield pruning opportunities?

Lowest probability first

Highest probability first

Doesn’t make any difference

In the following, a “max” tree consists only of max nodes, whereas an “expectimax” tree consists of a max node at the root with alternating layers of chance and max nodes. At chance nodes, all outcome probabilities are nonzero. The goal is to find the value of the root with a boundeddepth search.

Assuming that leaf values are finite but unbounded, is pruning (as in alpha–beta) ever possible in a max tree? Give an example, or explain why not.

Is pruning ever possible in an expectimax tree under the same conditions? Give an example, or explain why not.

If leaf values are constrained to be in the range $[0,1]$, is pruning ever possible in a max tree? Give an example, or explain why not.

If leaf values are constrained to be in the range $[0,1]$, is pruning ever possible in an expectimax tree? Give an example (qualitatively different from your example in (e), if any), or explain why not.

If leaf values are constrained to be nonnegative, is pruning ever possible in a max tree? Give an example, or explain why not.

If leaf values are constrained to be nonnegative, is pruning ever possible in an expectimax tree? Give an example, or explain why not.

Consider the outcomes of a chance node in an expectimax tree. Which of the following evaluation orders is most likely to yield pruning opportunities: (i) Lowest probability first; (ii) Highest probability first; (iii) Doesn’t make any difference?
Which of the following are true and which are false? Give brief explanations.

In a fully observable, turntaking, zerosum game between two perfectly rational players, it does not help the first player to know what strategy the second player is using—that is, what move the second player will make, given the first player’s move.

In a partially observable, turntaking, zerosum game between two perfectly rational players, it does not help the first player to know what move the second player will make, given the first player’s move.

A perfectly rational backgammon agent never loses.
Consider carefully the interplay of chance events and partial information in each of the games in Exercise [gameplayingchanceexercise].

For which is the standard expectiminimax model appropriate? Implement the algorithm and run it in your gameplaying agent, with appropriate modifications to the gameplaying environment.

For which would the scheme described in Exercise [gameplayingmontecarloexercise] be appropriate?

Discuss how you might deal with the fact that in some of the games, the players do not have the same knowledge of the current state.