Probability, MDPs & Decisions

Probability models (Chapter 13-15)

probability.DTAgentProgram(belief_state)[source]: [Figure 13.1] A decision-theoretic agent.

class probability.ProbDist(var_name='?', freq=None)[source]

Bases: object

A discrete probability distribution. You name the random variable in the constructor, then assign and query probability of values. >>> P = ProbDist(‘Flip’); P[‘H’], P[‘T’] = 0.25, 0.75; P[‘H’] 0.25 >>> P = ProbDist(‘X’, {‘lo’: 125, ‘med’: 375, ‘hi’: 500}) >>> P[‘lo’], P[‘med’], P[‘hi’] (0.125, 0.375, 0.5)

normalize()[source]: Make sure the probabilities of all values sum to 1. Returns the normalized distribution. Raises a ZeroDivisionError if the sum of the values is 0.

show_approx(numfmt='{:.3g}')[source]: Show the probabilities rounded and sorted by key, for the sake of portable doctests.

class probability.JointProbDist(variables)[source]

Bases: ProbDist

A discrete probability distribute over a set of variables. >>> P = JointProbDist([‘X’, ‘Y’]); P[1, 1] = 0.25 >>> P[1, 1] 0.25 >>> P[dict(X=0, Y=1)] = 0.5 >>> P[dict(X=0, Y=1)] 0.5

values(var)[source]: Return the set of possible values for a variable.

probability.event_values(event, variables)[source]: Return a tuple of the values of variables in event. >>> event_values ({‘A’: 10, ‘B’: 9, ‘C’: 8}, [‘C’, ‘A’]) (8, 10) >>> event_values ((1, 2), [‘C’, ‘A’]) (1, 2)

probability.enumerate_joint_ask(X, e, P)[source]: [Section 13.3] Return a probability distribution over the values of the variable X, given the {var:val} observations e, in the JointProbDist P. >>> P = JointProbDist([‘X’, ‘Y’]) >>> P[0,0] = 0.25; P[0,1] = 0.5; P[1,1] = P[2,1] = 0.125 >>> enumerate_joint_ask(‘X’, dict(Y=1), P).show_approx() ‘0: 0.667, 1: 0.167, 2: 0.167’

probability.enumerate_joint(variables, e, P)[source]: Return the sum of those entries in P consistent with e, provided variables is P’s remaining variables (the ones not in e).

class probability.BayesNet(node_specs=None)[source]

Bases: object

Bayesian network containing only boolean-variable nodes.

add(node_spec)[source]: Add a node to the net. Its parents must already be in the net, and its variable must not.

variable_node(var)[source]: Return the node for the variable named var. >>> burglary.variable_node(‘Burglary’).variable ‘Burglary’

variable_values(var)[source]: Return the domain of var.

class probability.DecisionNetwork(action, infer)[source]

Bases: BayesNet

An abstract class for a decision network as a wrapper for a BayesNet. Represents an agent’s current state, its possible actions, reachable states and utilities of those states.

best_action()[source]: Return the best action in the network

get_utility(action, state)[source]: Return the utility for a particular action and state in the network

get_expected_utility(action, evidence)[source]: Compute the expected utility given an action and evidence

class probability.InformationGatheringAgent(decnet, infer, initial_evidence=None)[source]

Bases: Agent

[Figure 16.9] A simple information gathering agent. The agent works by repeatedly selecting the observation with the highest information value, until the cost of the next observation is greater than its expected benefit.

integrate_percept(percept)[source]: Integrate the given percept into the decision network

execute(percept)[source]: Execute the information gathering algorithm

request(variable)[source]: Return the value of the given random variable as the next percept

cost(var)[source]: Return the cost of obtaining evidence through tests, consultants or questions

vpi_cost_ratio(variables)[source]: Return the VPI to cost ratio for the given variables

vpi(variable)[source]: Return VPI for a given variable

class probability.BayesNode(X, parents, cpt)[source]

Bases: object

A conditional probability distribution for a boolean variable, P(X | parents). Part of a BayesNet.

p(value, event)[source]: Return the conditional probability P(X=value | parents=parent_values), where parent_values are the values of parents in event. (event must assign each parent a value.) >>> bn = BayesNode(‘X’, ‘Burglary’, {T: 0.2, F: 0.625}) >>> bn.p(False, {‘Burglary’: False, ‘Earthquake’: True}) 0.375

sample(event)[source]: Sample from the distribution for this variable conditioned on event’s values for parent_variables. That is, return True/False at random according with the conditional probability given the parents.

probability.enumeration_ask(X, e, bn)[source]: [Figure 14.9] Return the conditional probability distribution of variable X given evidence e, from BayesNet bn. >>> enumeration_ask(‘Burglary’, dict(JohnCalls=T, MaryCalls=T), burglary … ).show_approx() ‘False: 0.716, True: 0.284’

probability.enumerate_all(variables, e, bn)[source]: Return the sum of those entries in P(variables | e{others}) consistent with e, where P is the joint distribution represented by bn, and e{others} means e restricted to bn’s other variables (the ones other than variables). Parents must precede children in variables.

probability.elimination_ask(X, e, bn)[source]: [Figure 14.11] Compute bn’s P(X|e) by variable elimination. >>> elimination_ask(‘Burglary’, dict(JohnCalls=T, MaryCalls=T), burglary … ).show_approx() ‘False: 0.716, True: 0.284’

probability.is_hidden(var, X, e)[source]: Is var a hidden variable when querying P(X|e)?

probability.make_factor(var, e, bn)[source]: Return the factor for var in bn’s joint distribution given e. That is, bn’s full joint distribution, projected to accord with e, is the pointwise product of these factors for bn’s variables.

probability.pointwise_product(factors, bn)[source]: Multiply a sequence of factors together into a single factor over the union of their variables, using the Bayes net bn to enumerate variable values.

probability.sum_out(var, factors, bn)[source]: Eliminate var from all factors by summing over its values.

class probability.Factor(variables, cpt)[source]

Bases: object

A factor in a joint distribution.

pointwise_product(other, bn)[source]: Multiply two factors, combining their variables.

sum_out(var, bn)[source]: Make a factor eliminating var by summing over its values.

normalize()[source]: Return my probabilities; must be down to one variable.

p(e)[source]: Look up my value tabulated for e.

probability.all_events(variables, bn, e)[source]: Yield every way of extending e with values for all variables.

probability.prior_sample(bn)[source]: [Figure 14.13] Randomly sample from bn’s full joint distribution. The result is a {variable: value} dict.

probability.rejection_sampling(X, e, bn, N=10000)[source]: [Figure 14.14] Estimate the probability distribution of variable X given evidence e in BayesNet bn, using N samples. Raises a ZeroDivisionError if all the N samples are rejected, i.e., inconsistent with e. >>> random.seed(47) >>> rejection_sampling(‘Burglary’, dict(JohnCalls=T, MaryCalls=T), … burglary, 10000).show_approx() ‘False: 0.7, True: 0.3’

probability.consistent_with(event, evidence)[source]: Is event consistent with the given evidence?

probability.likelihood_weighting(X, e, bn, N=10000)[source]: [Figure 14.15] Estimate the probability distribution of variable X given evidence e in BayesNet bn. >>> random.seed(1017) >>> likelihood_weighting(‘Burglary’, dict(JohnCalls=T, MaryCalls=T), … burglary, 10000).show_approx() ‘False: 0.702, True: 0.298’

probability.weighted_sample(bn, e)[source]: Sample an event from bn that’s consistent with the evidence e; return the event and its weight, the likelihood that the event accords to the evidence.

probability.gibbs_ask(X, e, bn, N=1000)[source]: [Figure 14.16]

probability.markov_blanket_sample(X, e, bn)[source]: Return a sample from P(X | mb) where mb denotes that the variables in the Markov blanket of X take their values from event e (which must assign a value to each). The Markov blanket of X is X’s parents, children, and children’s parents.

class probability.HiddenMarkovModel(transition_model, sensor_model, prior=None)[source]

Bases: object

A Hidden markov model which takes Transition model and Sensor model as inputs

sensor_dist(ev)[source]: Return the sensor (observation) distribution corresponding to the evidence ev: the first row of the sensor model when ev is True, otherwise the second.

probability.forward(HMM, fv, ev)[source]: Perform one forward (filtering) step of an HMM: project the forward message fv through the transition model, weight it by the sensor distribution for evidence ev, and return the normalized next forward message. [Figure 15.4]

probability.backward(HMM, b, ev)[source]: Perform one backward step of an HMM: weight the backward message b by the sensor distribution for evidence ev and propagate it through the transition model, returning the normalized previous backward message. [Figure 15.4]

probability.forward_backward(HMM, ev)[source]: [Figure 15.4] Forward-Backward algorithm for smoothing. Computes posterior probabilities of a sequence of states given a sequence of observations.

probability.viterbi(HMM, ev)[source]: [Equation 15.11] Viterbi algorithm to find the most likely sequence. Computes the best path and the corresponding probabilities, given an HMM model and a sequence of observations.

probability.baum_welch(HMM, observations, iterations=100)[source]

[Section 20.3] Baum-Welch algorithm: the instance of EM that learns the parameters of a Hidden Markov Model (transition model, sensor model and prior) from a single sequence of boolean ‘observations’, starting from the initial guess in ‘HMM’. Each iteration runs a (scaled) forward-backward pass to compute the smoothed state marginals gamma_t(i) = P(X_t=i | e_1:T) and transition marginals xi_t(i,j) = P(X_t=i, X_t+1=j | e_1:T) (E-step), then re-estimates every parameter as the corresponding normalized expected count (M-step):

prior_i     = gamma_0(i)
A_ij        = sum_t xi_t(i, j) / sum_t gamma_t(i)
sensor_oi   = sum_{t: e_t = o} gamma_t(i) / sum_t gamma_t(i)

Returns a new HiddenMarkovModel with the learned parameters.

probability.fixed_lag_smoothing(e_t, HMM, d, ev, t)[source]: [Figure 15.6] Smoothing algorithm with a fixed time lag of ‘d’ steps. Computes the smoothed estimate P(X_{t-d} | e_{1:t}) for the slice that lies ‘d’ steps in the past, given the evidence sequence ev = [e_1, …, e_t]. Returns None when there is not yet enough evidence (t <= d).

probability.particle_filtering(e, N, HMM)[source]: Particle filtering considering two states variables.

class probability.KalmanFilter(transition_model, sensor_model, transition_noise, sensor_noise)[source]

Bases: object

[Section 15.4] Kalman filter for a linear-Gaussian dynamical system. The hidden state evolves and is observed according to the linear-Gaussian model

x_{t+1} = F x_t + noise, noise ~ N(0, Sigma_x) (transition model) z_t = H x_t + noise, noise ~ N(0, Sigma_z) (sensor model)

where F is the transition matrix, H the sensor matrix, Sigma_x the transition (process) noise covariance and Sigma_z the sensor (measurement) noise covariance. Because the family of Gaussians is closed under the Bayesian filtering update, the forward message stays Gaussian and is fully described by a mean vector and a covariance matrix at every step.

predict(mean, cov)[source]: Time update: project the Gaussian estimate one step forward through F.

update(mean, cov, z)[source]: Measurement update: condition the predicted Gaussian on observation z.

filter(mean, cov, z)[source]: One predict-then-update cycle for a single new observation z.

probability.kalman_filter(KF, mean0, cov0, observations)[source]: [Section 15.4] Run the Kalman filter ‘KF’ over a sequence of ‘observations’, starting from the Gaussian prior N(mean0, cov0). Returns, for each time step, the filtered Gaussian estimate as a (mean, covariance) pair.

class probability.DynamicBayesNet(prior, transition, sensors)[source]

Bases: object

[Section 15.5] A dynamic Bayesian network for a stationary first-order Markov process. It is specified by a prior network over the state variables at slice 0 and a single transition + sensor network describing, for one time step, the distribution of each state variable (given the previous slice) and of each evidence variable (given the current slice). The DBN can be ‘unrolled’ into an ordinary BayesNet spanning any number of slices and then queried with the exact inference algorithms; in particular filtering is the query for the last state variable given the whole evidence sequence.

Each spec is a (variable, parents, cpt) triple as for a BayesNode. In a transition spec, a parent named ‘<var>_prev’ refers to state variable <var> at the previous slice; every other parent refers to the current slice.

unroll(steps)[source]: Unroll the DBN into a BayesNet over slices 0..steps (evidence at 1..steps).

filter(evidence, query, infer=<function elimination_ask>)[source]: Filtering: the posterior over ‘query’ at the last slice given the whole observation sequence. ‘evidence’ is a list of dicts, one per time step t = 1, 2, …, each mapping evidence variables to their observed values.

class probability.MCLmap(m)[source]

Bases: object

Map which provides probability distributions and sensor readings. Consists of discrete cells which are either an obstacle or empty

sample()[source]: Returns a random kinematic state possible in the map

ray_cast(sensor_num, kin_state)[source]: Returns distance to nearest obstacle or map boundary in the direction of sensor

probability.monte_carlo_localization(a, z, N, P_motion_sample, P_sensor, m, S=None)[source]: [Figure 25.9] Monte Carlo localization algorithm

Probability models (Chapter 12-13)

probability4e.DTAgentProgram(belief_state)[source]: A decision-theoretic agent. [Figure 12.1]

class probability4e.ProbDist(varname='?', freqs=None)[source]

Bases: object

A discrete probability distribution. You name the random variable in the constructor, then assign and query probability of values. >>> P = ProbDist(‘Flip’); P[‘H’], P[‘T’] = 0.25, 0.75; P[‘H’] 0.25 >>> P = ProbDist(‘X’, {‘lo’: 125, ‘med’: 375, ‘hi’: 500}) >>> P[‘lo’], P[‘med’], P[‘hi’] (0.125, 0.375, 0.5)

normalize()[source]: Make sure the probabilities of all values sum to 1. Returns the normalized distribution. Raises a ZeroDivisionError if the sum of the values is 0.

show_approx(numfmt='{:.3g}')[source]: Show the probabilities rounded and sorted by key, for the sake of portable doctests.

class probability4e.JointProbDist(variables)[source]

Bases: ProbDist

A discrete probability distribute over a set of variables. >>> P = JointProbDist([‘X’, ‘Y’]); P[1, 1] = 0.25 >>> P[1, 1] 0.25 >>> P[dict(X=0, Y=1)] = 0.5 >>> P[dict(X=0, Y=1)] 0.5

values(var)[source]: Return the set of possible values for a variable.

probability4e.event_values(event, variables)[source]: Return a tuple of the values of variables in event. >>> event_values ({‘A’: 10, ‘B’: 9, ‘C’: 8}, [‘C’, ‘A’]) (8, 10) >>> event_values ((1, 2), [‘C’, ‘A’]) (1, 2)

probability4e.enumerate_joint_ask(X, e, P)[source]: Return a probability distribution over the values of the variable X, given the {var:val} observations e, in the JointProbDist P. [Section 12.3] >>> P = JointProbDist([‘X’, ‘Y’]) >>> P[0,0] = 0.25; P[0,1] = 0.5; P[1,1] = P[2,1] = 0.125 >>> enumerate_joint_ask(‘X’, dict(Y=1), P).show_approx() ‘0: 0.667, 1: 0.167, 2: 0.167’

probability4e.enumerate_joint(variables, e, P)[source]: Return the sum of those entries in P consistent with e, provided variables is P’s remaining variables (the ones not in e).

probability4e.is_independent(variables, P)[source]: Return whether a list of variables are independent given their distribution P P is an instance of JoinProbDist >>> P = JointProbDist([‘X’, ‘Y’]) >>> P[0,0] = 0.25; P[0,1] = 0.5; P[1,1] = P[1,0] = 0.125 >>> is_independent([‘X’, ‘Y’], P) False

probability4e.gen_possible_events(vars, P)[source]: Generate all possible events of a collection of vars according to distribution of P

class probability4e.BayesNet(node_specs=None)[source]

Bases: object

Bayesian network containing only boolean-variable nodes.

add(node_spec)[source]: Add a node to the net. Its parents must already be in the net, and its variable must not. Initialize Bayes nodes by detecting the length of input node specs

variable_node(var)[source]: Return the node for the variable named var. >>> burglary.variable_node(‘Burglary’).variable ‘Burglary’

variable_values(var)[source]: Return the domain of var.

class probability4e.BayesNode(X, parents, cpt)[source]

Bases: object

A conditional probability distribution for a boolean variable, P(X | parents). Part of a BayesNet.

p(value, event)[source]: Return the conditional probability P(X=value | parents=parent_values), where parent_values are the values of parents in event. (event must assign each parent a value.) >>> bn = BayesNode(‘X’, ‘Burglary’, {T: 0.2, F: 0.625}) >>> bn.p(False, {‘Burglary’: False, ‘Earthquake’: True}) 0.375

sample(event)[source]: Sample from the distribution for this variable conditioned on event’s values for parent_variables. That is, return True/False at random according with the conditional probability given the parents.

probability4e.gaussian_probability(param, event, value)[source]

Gaussian probability of a continuous Bayesian network node on condition of certain event and the parameters determined by the event

Parameters:

param – parameters determined by discrete parent events of current node
event – a dict, continuous event of current node, the values are used as parameters in calculating distribution
value – float, the value of current continuous node

Returns:

float, the calculated probability

>>> param = {'sigma':0.5, 'b':1, 'a':{'h1':0.5, 'h2': 1.5}}
>>> event = {'h1':0.6, 'h2': 0.3}
>>> gaussian_probability(param, event, 1)
0.2590351913317835

probability4e.logistic_probability(param, event, value)[source]: Logistic probability of a discrete node in Bayesian network with continuous parents, :param param: a dict, parameters determined by discrete parents of current node :param event: a dict, names and values of continuous parent variables of current node :param value: boolean, True or False :return: int, probability

class probability4e.ContinuousBayesNode(name, d_parents, c_parents, parameters, type)[source]

Bases: object

A Bayesian network node with continuous distribution or with continuous distributed parents

continuous_p(value, c_event, d_event)[source]: Probability given the value of current node and its parents :param c_event: event of continuous nodes :param d_event: event of discrete nodes

probability4e.enumeration_ask(X, e, bn)[source]: Return the conditional probability distribution of variable X given evidence e, from BayesNet bn. [Figure 13.10] >>> enumeration_ask(‘Burglary’, dict(JohnCalls=T, MaryCalls=T), burglary … ).show_approx() ‘False: 0.716, True: 0.284’

probability4e.enumerate_all(variables, e, bn)[source]: Return the sum of those entries in P(variables | e{others}) consistent with e, where P is the joint distribution represented by bn, and e{others} means e restricted to bn’s other variables (the ones other than variables). Parents must precede children in variables.

probability4e.elimination_ask(X, e, bn)[source]: Compute bn’s P(X|e) by variable elimination. [Figure 13.12] >>> elimination_ask(‘Burglary’, dict(JohnCalls=T, MaryCalls=T), burglary … ).show_approx() ‘False: 0.716, True: 0.284’

probability4e.is_hidden(var, X, e)[source]: Is var a hidden variable when querying P(X|e)?

probability4e.make_factor(var, e, bn)[source]: Return the factor for var in bn’s joint distribution given e. That is, bn’s full joint distribution, projected to accord with e, is the pointwise product of these factors for bn’s variables.

probability4e.pointwise_product(factors, bn)[source]: Multiply a sequence of factors together into a single factor over the union of their variables, using the Bayes net bn to enumerate variable values.

probability4e.sum_out(var, factors, bn)[source]: Eliminate var from all factors by summing over its values.

class probability4e.Factor(variables, cpt)[source]

Bases: object

A factor in a joint distribution.

pointwise_product(other, bn)[source]: Multiply two factors, combining their variables.

sum_out(var, bn)[source]: Make a factor eliminating var by summing over its values.

normalize()[source]: Return my probabilities; must be down to one variable.

p(e)[source]: Look up my value tabulated for e.

probability4e.all_events(variables, bn, e)[source]: Yield every way of extending e with values for all variables.

probability4e.prior_sample(bn)[source]: Randomly sample from bn’s full joint distribution. The result is a {variable: value} dict. [Figure 13.15]

probability4e.rejection_sampling(X, e, bn, N=10000)[source]: [Figure 13.16] Estimate the probability distribution of variable X given evidence e in BayesNet bn, using N samples. Raises a ZeroDivisionError if all the N samples are rejected, i.e., inconsistent with e. >>> random.seed(47) >>> rejection_sampling(‘Burglary’, dict(JohnCalls=T, MaryCalls=T), … burglary, 10000).show_approx() ‘False: 0.7, True: 0.3’

probability4e.consistent_with(event, evidence)[source]: Is event consistent with the given evidence?

probability4e.likelihood_weighting(X, e, bn, N=10000)[source]: [Figure 13.17] Estimate the probability distribution of variable X given evidence e in BayesNet bn. >>> random.seed(1017) >>> likelihood_weighting(‘Burglary’, dict(JohnCalls=T, MaryCalls=T), … burglary, 10000).show_approx() ‘False: 0.702, True: 0.298’

probability4e.weighted_sample(bn, e)[source]: Sample an event from bn that’s consistent with the evidence e; return the event and its weight, the likelihood that the event accords to the evidence.

probability4e.gibbs_ask(X, e, bn, N=1000)[source]: [Figure 13.19]

probability4e.markov_blanket_sample(X, e, bn)[source]: Return a sample from P(X | mb) where mb denotes that the variables in the Markov blanket of X take their values from event e (which must assign a value to each). The Markov blanket of X is X’s parents, children, and children’s parents.

class probability4e.complied_burglary[source]

Bases: object

compiled version of burglary network

Burglary(sample)[source]: Return P(Burglary=True) conditioned on the Alarm and Earthquake values of the given sample, using precomputed (compiled) probabilities for its Markov blanket.

Earthquake(sample)[source]: Return P(Earthquake=True) conditioned on the Alarm and Burglary values of the given sample, using precomputed (compiled) probabilities for its Markov blanket.

MaryCalls(sample)[source]: Return P(MaryCalls=True) given the Alarm value of the sample.

JongCalls(sample)[source]: Return P(JongCalls=True) given the Alarm value of the sample.

Alarm(sample)[source]: Return P(Alarm=True) for the sample (not implemented in this compiled network).

Markov Decision Processes (Chapter 17)

First we define an MDP, and the special case of a GridMDP, in which states are laid out in a 2-dimensional grid. We also represent a policy as a dictionary of {state: action} pairs, and a Utility function as a dictionary of {state: number} pairs. We then define the value_iteration and policy_iteration algorithms.

>>> pi = best_policy(sequential_decision_environment, value_iteration(sequential_decision_environment, .01))

>>> sequential_decision_environment.to_arrows(pi)
[['>', '>', '>', '.'], ['^', None, '^', '.'], ['^', '>', '^', '<']]

>>> from utils import print_table

>>> print_table(sequential_decision_environment.to_arrows(pi))
>   >      >   .
^   None   ^   .
^   >      ^   <

>>> print_table(sequential_decision_environment.to_arrows(policy_iteration(sequential_decision_environment)))
>   >      >   .
^   None   ^   .
^   >      ^   <

class mdp.MDP(init, actlist, terminals, transitions=None, reward=None, states=None, gamma=0.9)[source]

Bases: object

A Markov Decision Process, defined by an initial state, transition model, and reward function. We also keep track of a gamma value, for use by algorithms. The transition model is represented somewhat differently from the text. Instead of P(s’ | s, a) being a probability number for each state/state/action triplet, we instead have T(s, a) return a list of (p, s’) pairs. We also keep track of the possible states, terminal states, and actions for each state. [Page 646]

R(state)[source]: Return a numeric reward for this state.

T(state, action)[source]: Transition model. From a state and an action, return a list of (probability, result-state) pairs.

actions(state)[source]: Return a list of actions that can be performed in this state. By default, a fixed list of actions, except for terminal states. Override this method if you need to specialize by state.

get_states_from_transitions(transitions)[source]: Return the set of all states mentioned in the transition model, both as source states and as reachable next states (or None if transitions is not a dict).

check_consistency()[source]: Assert that the MDP is well formed: the states match those in the transitions, the initial and terminal states are valid, rewards are defined for every state, and each action’s outcome probabilities sum to 1.

class mdp.MDP2(init, actlist, terminals, transitions, reward=None, gamma=0.9)[source]

Bases: MDP

Inherits from MDP. Handles terminal states, and transitions to and from terminal states better.

T(state, action)[source]: Return a list of (probability, next-state) pairs for taking the action in the given state; a None action stays in place with probability 0.0.

class mdp.GridMDP(grid, terminals, init=(0, 0), gamma=0.9)[source]

Bases: MDP

A two-dimensional grid MDP, as in [Figure 17.1]. All you have to do is specify the grid as a list of lists of rewards; use None for an obstacle (unreachable state). Also, you should specify the terminal states. An action is an (x, y) unit vector; e.g. (1, 0) means move east.

calculate_T(state, action)[source]: Return the (probability, next-state) outcomes for the intended action: 0.8 of moving as intended and 0.1 each of veering to the right or left.

T(state, action)[source]: Return the list of (probability, next-state) pairs for the action in the given state; a falsy action stays in place with probability 0.0.

go(state, direction)[source]: Return the state that results from going in this direction.

to_grid(mapping)[source]: Convert a mapping from (x, y) to v into a [[…, v, …]] grid.

to_arrows(policy)[source]: Render a policy as a grid of arrow characters (>, ^, <, v, with . for a None action) laid out to match the grid.

mdp.gen_grid(n_rows=3, n_cols=4, terminals=((3, 2), (3, 1)), main_reward=-0.04, terminal_rewards=(1, -1), block_coords=((1, 1),))[source]: Generate a grid (list of lists of rewards) of arbitrary size in the format accepted by GridMDP, e.g. GridMDP(gen_grid(…), terminals=[…]). n_rows, n_cols: grid dimensions. terminals: (x, y) coordinates of the terminal cells. main_reward: reward for every non-terminal, non-blocked cell. terminal_rewards: reward for each cell in terminals (paired by position). block_coords: (x, y) coordinates of obstacles (set to None / unreachable).

mdp.value_iteration(mdp, epsilon=0.001)[source]: Solving an MDP by value iteration. [Figure 17.4]

mdp.best_policy(mdp, U)[source]: Given an MDP and a utility function U, determine the best policy, as a mapping from state to action. [Equation 17.4]

mdp.expected_utility(a, s, U, mdp)[source]: The expected utility of doing a in state s, according to the MDP and U.

mdp.policy_iteration(mdp)[source]: Solve an MDP by policy iteration [Figure 17.7]

mdp.policy_evaluation(pi, U, mdp, k=20)[source]: Return an updated utility mapping U from each state in the MDP to its utility, using an approximation (modified policy iteration).

class mdp.POMDP(actions, transitions=None, evidences=None, rewards=None, states=None, gamma=0.95)[source]

Bases: MDP

A Partially Observable Markov Decision Process, defined by a transition model P(s’|s,a), actions A(s), a reward function R(s), and a sensor model P(e|s). We also keep track of a gamma value, for use by algorithms. The transition and the sensor models are defined as matrices. We also keep track of the possible states and actions for each state. [Page 659].

remove_dominated_plans(input_values)[source]: Remove dominated plans. This method finds all the lines contributing to the upper surface and removes those which don’t.

remove_dominated_plans_fast(input_values)[source]: Remove dominated plans using approximations. Resamples the upper boundary at intervals of 100 and finds the maximum values at these points.

generate_mapping(best, input_values)[source]: Generate mappings after removing dominated plans

max_difference(U1, U2)[source]: Find maximum difference between two utility mappings

class mdp.Matrix[source]

Bases: object

Matrix operations class

static add(A, B)[source]: Add two matrices A and B

static scalar_multiply(a, B)[source]: Multiply scalar a to matrix B

static multiply(A, B)[source]: Multiply two matrices A and B element-wise

static matmul(A, B)[source]: Inner-product of two matrices

static transpose(A)[source]: Transpose a matrix

mdp.pomdp_value_iteration(pomdp, epsilon=0.1)[source]: Solving a POMDP by value iteration.

Markov Decision Processes (Chapter 16)

First we define an MDP, and the special case of a GridMDP, in which states are laid out in a 2-dimensional grid. We also represent a policy as a dictionary of {state: action} pairs, and a Utility function as a dictionary of {state: number} pairs. We then define the value_iteration and policy_iteration algorithms.

>>> pi = best_policy(sequential_decision_environment, value_iteration(sequential_decision_environment, .01))

>>> sequential_decision_environment.to_arrows(pi)
[['>', '>', '>', '.'], ['^', None, '^', '.'], ['^', '>', '^', '<']]

>>> from utils import print_table

>>> print_table(sequential_decision_environment.to_arrows(pi))
>   >      >   .
^   None   ^   .
^   >      ^   <

>>> print_table(sequential_decision_environment.to_arrows(policy_iteration(sequential_decision_environment)))
>   >      >   .
^   None   ^   .
^   >      ^   <

class mdp4e.MDP(init, actlist, terminals, transitions=None, reward=None, states=None, gamma=0.9)[source]

Bases: object

A Markov Decision Process, defined by an initial state, transition model, and reward function. We also keep track of a gamma value, for use by algorithms. The transition model is represented somewhat differently from the text. Instead of P(s’ | s, a) being a probability number for each state/state/action triplet, we instead have T(s, a) return a list of (p, s’) pairs. We also keep track of the possible states, terminal states, and actions for each state. [Page 646]

R(state)[source]: Return a numeric reward for this state.

T(state, action)[source]: Transition model. From a state and an action, return a list of (probability, result-state) pairs.

actions(state)[source]: Return a list of actions that can be performed in this state. By default, a fixed list of actions, except for terminal states. Override this method if you need to specialize by state.

get_states_from_transitions(transitions)[source]: Return the set of all states mentioned in the transition model, both as source states and as reachable next states (or None if transitions is not a dict).

check_consistency()[source]: Assert that the MDP is well formed: the states match those in the transitions, the initial and terminal states are valid, rewards are defined for every state, and each action’s outcome probabilities sum to 1.

class mdp4e.MDP2(init, actlist, terminals, transitions, reward=None, gamma=0.9)[source]

Bases: MDP

Inherits from MDP. Handles terminal states, and transitions to and from terminal states better.

T(state, action)[source]: Return a list of (probability, next-state) pairs for taking the action in the given state; a None action stays in place with probability 0.0.

class mdp4e.GridMDP(grid, terminals, init=(0, 0), gamma=0.9)[source]

Bases: MDP

A two-dimensional grid MDP, as in [Figure 16.1]. All you have to do is specify the grid as a list of lists of rewards; use None for an obstacle (unreachable state). Also, you should specify the terminal states. An action is an (x, y) unit vector; e.g. (1, 0) means move east.

calculate_T(state, action)[source]: Return the (probability, next-state) outcomes for the intended action: 0.8 of moving as intended and 0.1 each of veering to the right or left.

T(state, action)[source]: Return the list of (probability, next-state) pairs for the action in the given state; a falsy action stays in place with probability 0.0.

go(state, direction)[source]: Return the state that results from going in this direction.

to_grid(mapping)[source]: Convert a mapping from (x, y) to v into a [[…, v, …]] grid.

to_arrows(policy)[source]: Render a policy as a grid of arrow characters (>, ^, <, v, with . for a None action) laid out to match the grid.

mdp4e.q_value(mdp, s, a, U)[source]: Return the Q-value of taking action a in state s given the utility estimates U, i.e. the expected sum of the immediate reward and the discounted utility of the successor states. With no action it falls back to the reward of s.

mdp4e.value_iteration(mdp, epsilon=0.001)[source]: Solving an MDP by value iteration. [Figure 16.6]

mdp4e.best_policy(mdp, U)[source]: Given an MDP and a utility function U, determine the best policy, as a mapping from state to action.

mdp4e.expected_utility(a, s, U, mdp)[source]: The expected utility of doing a in state s, according to the MDP and U.

mdp4e.policy_iteration(mdp)[source]: Solve an MDP by policy iteration [Figure 17.7]

mdp4e.policy_evaluation(pi, U, mdp, k=20)[source]: Return an updated utility mapping U from each state in the MDP to its utility, using an approximation (modified policy iteration).

class mdp4e.POMDP(actions, transitions=None, evidences=None, rewards=None, states=None, gamma=0.95)[source]

Bases: MDP

A Partially Observable Markov Decision Process, defined by a transition model P(s’|s,a), actions A(s), a reward function R(s), and a sensor model P(e|s). We also keep track of a gamma value, for use by algorithms. The transition and the sensor models are defined as matrices. We also keep track of the possible states and actions for each state. [Page 659].

remove_dominated_plans(input_values)[source]: Remove dominated plans. This method finds all the lines contributing to the upper surface and removes those which don’t.

remove_dominated_plans_fast(input_values)[source]: Remove dominated plans using approximations. Resamples the upper boundary at intervals of 100 and finds the maximum values at these points.

generate_mapping(best, input_values)[source]: Generate mappings after removing dominated plans

max_difference(U1, U2)[source]: Find maximum difference between two utility mappings

class mdp4e.Matrix[source]

Bases: object

Matrix operations class

static add(A, B)[source]: Add two matrices A and B

static scalar_multiply(a, B)[source]: Multiply scalar a to matrix B

static multiply(A, B)[source]: Multiply two matrices A and B element-wise

static matmul(A, B)[source]: Inner-product of two matrices

static transpose(A)[source]: Transpose a matrix

mdp4e.pomdp_value_iteration(pomdp, epsilon=0.1)[source]: Solving a POMDP by value iteration.

mdp4e.update_belief(pomdp, belief, action, observation)[source]

[Equation 17.17] POMDP filtering: update the belief state (a probability distribution over the states) after executing ‘action’ and perceiving ‘observation’. The prediction step propagates the belief through the transition model and the update step weights it by the sensor model, then renormalizes:

b'(s') = alpha * P(o | s') * sum_s P(s' | s, a) b(s)

mdp4e.pomdp_lookahead(pomdp, belief, depth)[source]: [Section 17.5 - Dynamic Decision Networks] Online decision making for a POMDP modeled as a dynamic decision network (DDN): the network is projected ‘depth’ steps into the future and solved by belief-state expectimax search. Decision nodes range over the belief states and chance nodes branch over the possible observations, with the belief updated by POMDP filtering ([Equation 17.17]) along each branch. Returns the action that maximizes the expected discounted utility from ‘belief’.

Making Simple Decisions (Chapter 15)

class making_simple_decisions4e.DecisionNetwork(action, infer)[source]

Bases: BayesNet

An abstract class for a decision network as a wrapper for a BayesNet. Represents an agent’s current state, its possible actions, reachable states and utilities of those states.

best_action()[source]: Return the best action in the network

get_utility(action, state)[source]: Return the utility for a particular action and state in the network

get_expected_utility(action, evidence)[source]: Compute the expected utility given an action and evidence

class making_simple_decisions4e.InformationGatheringAgent(decnet, infer, initial_evidence=None)[source]

Bases: Agent

A simple information gathering agent. The agent works by repeatedly selecting the observation with the highest information value, until the cost of the next observation is greater than its expected benefit. [Figure 16.9]

integrate_percept(percept)[source]: Integrate the given percept into the decision network

execute(percept)[source]: Execute the information gathering algorithm

request(variable)[source]: Return the value of the given random variable as the next percept

cost(var)[source]: Return the cost of obtaining evidence through tests, consultants or questions

vpi_cost_ratio(variables)[source]: Return the VPI to cost ratio for the given variables

vpi(variable)[source]: Return VPI for a given variable

class making_simple_decisions4e.MCLmap(m)[source]

Bases: object

Map which provides probability distributions and sensor readings. Consists of discrete cells which are either an obstacle or empty

sample()[source]: Returns a random kinematic state possible in the map

ray_cast(sensor_num, kin_state)[source]: Returns distance to nearest obstacle or map boundary in the direction of sensor

making_simple_decisions4e.monte_carlo_localization(a, z, N, P_motion_sample, P_sensor, m, S=None)[source]: Monte Carlo localization algorithm from Fig 25.9