Probability, MDPs & Decisions
Probability models (Chapter 13-15)
- class probability.ProbDist(var_name='?', freq=None)[source]
Bases:
objectA discrete probability distribution. You name the random variable in the constructor, then assign and query probability of values. >>> P = ProbDist(‘Flip’); P[‘H’], P[‘T’] = 0.25, 0.75; P[‘H’] 0.25 >>> P = ProbDist(‘X’, {‘lo’: 125, ‘med’: 375, ‘hi’: 500}) >>> P[‘lo’], P[‘med’], P[‘hi’] (0.125, 0.375, 0.5)
- class probability.JointProbDist(variables)[source]
Bases:
ProbDistA discrete probability distribute over a set of variables. >>> P = JointProbDist([‘X’, ‘Y’]); P[1, 1] = 0.25 >>> P[1, 1] 0.25 >>> P[dict(X=0, Y=1)] = 0.5 >>> P[dict(X=0, Y=1)] 0.5
- probability.event_values(event, variables)[source]
Return a tuple of the values of variables in event. >>> event_values ({‘A’: 10, ‘B’: 9, ‘C’: 8}, [‘C’, ‘A’]) (8, 10) >>> event_values ((1, 2), [‘C’, ‘A’]) (1, 2)
- probability.enumerate_joint_ask(X, e, P)[source]
[Section 13.3] Return a probability distribution over the values of the variable X, given the {var:val} observations e, in the JointProbDist P. >>> P = JointProbDist([‘X’, ‘Y’]) >>> P[0,0] = 0.25; P[0,1] = 0.5; P[1,1] = P[2,1] = 0.125 >>> enumerate_joint_ask(‘X’, dict(Y=1), P).show_approx() ‘0: 0.667, 1: 0.167, 2: 0.167’
- probability.enumerate_joint(variables, e, P)[source]
Return the sum of those entries in P consistent with e, provided variables is P’s remaining variables (the ones not in e).
- class probability.BayesNet(node_specs=None)[source]
Bases:
objectBayesian network containing only boolean-variable nodes.
- add(node_spec)[source]
Add a node to the net. Its parents must already be in the net, and its variable must not.
- class probability.DecisionNetwork(action, infer)[source]
Bases:
BayesNetAn abstract class for a decision network as a wrapper for a BayesNet. Represents an agent’s current state, its possible actions, reachable states and utilities of those states.
- class probability.InformationGatheringAgent(decnet, infer, initial_evidence=None)[source]
Bases:
Agent[Figure 16.9] A simple information gathering agent. The agent works by repeatedly selecting the observation with the highest information value, until the cost of the next observation is greater than its expected benefit.
- class probability.BayesNode(X, parents, cpt)[source]
Bases:
objectA conditional probability distribution for a boolean variable, P(X | parents). Part of a BayesNet.
- p(value, event)[source]
Return the conditional probability P(X=value | parents=parent_values), where parent_values are the values of parents in event. (event must assign each parent a value.) >>> bn = BayesNode(‘X’, ‘Burglary’, {T: 0.2, F: 0.625}) >>> bn.p(False, {‘Burglary’: False, ‘Earthquake’: True}) 0.375
- probability.enumeration_ask(X, e, bn)[source]
[Figure 14.9] Return the conditional probability distribution of variable X given evidence e, from BayesNet bn. >>> enumeration_ask(‘Burglary’, dict(JohnCalls=T, MaryCalls=T), burglary … ).show_approx() ‘False: 0.716, True: 0.284’
- probability.enumerate_all(variables, e, bn)[source]
Return the sum of those entries in P(variables | e{others}) consistent with e, where P is the joint distribution represented by bn, and e{others} means e restricted to bn’s other variables (the ones other than variables). Parents must precede children in variables.
- probability.elimination_ask(X, e, bn)[source]
[Figure 14.11] Compute bn’s P(X|e) by variable elimination. >>> elimination_ask(‘Burglary’, dict(JohnCalls=T, MaryCalls=T), burglary … ).show_approx() ‘False: 0.716, True: 0.284’
Is var a hidden variable when querying P(X|e)?
- probability.make_factor(var, e, bn)[source]
Return the factor for var in bn’s joint distribution given e. That is, bn’s full joint distribution, projected to accord with e, is the pointwise product of these factors for bn’s variables.
- probability.pointwise_product(factors, bn)[source]
Multiply a sequence of factors together into a single factor over the union of their variables, using the Bayes net
bnto enumerate variable values.
- probability.sum_out(var, factors, bn)[source]
Eliminate var from all factors by summing over its values.
- probability.all_events(variables, bn, e)[source]
Yield every way of extending e with values for all variables.
- probability.prior_sample(bn)[source]
[Figure 14.13] Randomly sample from bn’s full joint distribution. The result is a {variable: value} dict.
- probability.rejection_sampling(X, e, bn, N=10000)[source]
[Figure 14.14] Estimate the probability distribution of variable X given evidence e in BayesNet bn, using N samples. Raises a ZeroDivisionError if all the N samples are rejected, i.e., inconsistent with e. >>> random.seed(47) >>> rejection_sampling(‘Burglary’, dict(JohnCalls=T, MaryCalls=T), … burglary, 10000).show_approx() ‘False: 0.7, True: 0.3’
- probability.likelihood_weighting(X, e, bn, N=10000)[source]
[Figure 14.15] Estimate the probability distribution of variable X given evidence e in BayesNet bn. >>> random.seed(1017) >>> likelihood_weighting(‘Burglary’, dict(JohnCalls=T, MaryCalls=T), … burglary, 10000).show_approx() ‘False: 0.702, True: 0.298’
- probability.weighted_sample(bn, e)[source]
Sample an event from bn that’s consistent with the evidence e; return the event and its weight, the likelihood that the event accords to the evidence.
- probability.markov_blanket_sample(X, e, bn)[source]
Return a sample from P(X | mb) where mb denotes that the variables in the Markov blanket of X take their values from event e (which must assign a value to each). The Markov blanket of X is X’s parents, children, and children’s parents.
- class probability.HiddenMarkovModel(transition_model, sensor_model, prior=None)[source]
Bases:
objectA Hidden markov model which takes Transition model and Sensor model as inputs
- probability.forward(HMM, fv, ev)[source]
Perform one forward (filtering) step of an HMM: project the forward message
fvthrough the transition model, weight it by the sensor distribution for evidenceev, and return the normalized next forward message. [Figure 15.4]
- probability.backward(HMM, b, ev)[source]
Perform one backward step of an HMM: weight the backward message
bby the sensor distribution for evidenceevand propagate it through the transition model, returning the normalized previous backward message. [Figure 15.4]
- probability.forward_backward(HMM, ev)[source]
[Figure 15.4] Forward-Backward algorithm for smoothing. Computes posterior probabilities of a sequence of states given a sequence of observations.
- probability.viterbi(HMM, ev)[source]
[Equation 15.11] Viterbi algorithm to find the most likely sequence. Computes the best path and the corresponding probabilities, given an HMM model and a sequence of observations.
- probability.baum_welch(HMM, observations, iterations=100)[source]
[Section 20.3] Baum-Welch algorithm: the instance of EM that learns the parameters of a Hidden Markov Model (transition model, sensor model and prior) from a single sequence of boolean ‘observations’, starting from the initial guess in ‘HMM’. Each iteration runs a (scaled) forward-backward pass to compute the smoothed state marginals gamma_t(i) = P(X_t=i | e_1:T) and transition marginals xi_t(i,j) = P(X_t=i, X_t+1=j | e_1:T) (E-step), then re-estimates every parameter as the corresponding normalized expected count (M-step):
prior_i = gamma_0(i) A_ij = sum_t xi_t(i, j) / sum_t gamma_t(i) sensor_oi = sum_{t: e_t = o} gamma_t(i) / sum_t gamma_t(i)
Returns a new HiddenMarkovModel with the learned parameters.
- probability.fixed_lag_smoothing(e_t, HMM, d, ev, t)[source]
[Figure 15.6] Smoothing algorithm with a fixed time lag of ‘d’ steps. Computes the smoothed estimate P(X_{t-d} | e_{1:t}) for the slice that lies ‘d’ steps in the past, given the evidence sequence ev = [e_1, …, e_t]. Returns None when there is not yet enough evidence (t <= d).
- probability.particle_filtering(e, N, HMM)[source]
Particle filtering considering two states variables.
- class probability.KalmanFilter(transition_model, sensor_model, transition_noise, sensor_noise)[source]
Bases:
object[Section 15.4] Kalman filter for a linear-Gaussian dynamical system. The hidden state evolves and is observed according to the linear-Gaussian model
x_{t+1} = F x_t + noise, noise ~ N(0, Sigma_x) (transition model) z_t = H x_t + noise, noise ~ N(0, Sigma_z) (sensor model)
where F is the transition matrix, H the sensor matrix, Sigma_x the transition (process) noise covariance and Sigma_z the sensor (measurement) noise covariance. Because the family of Gaussians is closed under the Bayesian filtering update, the forward message stays Gaussian and is fully described by a mean vector and a covariance matrix at every step.
- probability.kalman_filter(KF, mean0, cov0, observations)[source]
[Section 15.4] Run the Kalman filter ‘KF’ over a sequence of ‘observations’, starting from the Gaussian prior N(mean0, cov0). Returns, for each time step, the filtered Gaussian estimate as a (mean, covariance) pair.
- class probability.DynamicBayesNet(prior, transition, sensors)[source]
Bases:
object[Section 15.5] A dynamic Bayesian network for a stationary first-order Markov process. It is specified by a prior network over the state variables at slice 0 and a single transition + sensor network describing, for one time step, the distribution of each state variable (given the previous slice) and of each evidence variable (given the current slice). The DBN can be ‘unrolled’ into an ordinary BayesNet spanning any number of slices and then queried with the exact inference algorithms; in particular filtering is the query for the last state variable given the whole evidence sequence.
Each spec is a (variable, parents, cpt) triple as for a BayesNode. In a transition spec, a parent named ‘<var>_prev’ refers to state variable <var> at the previous slice; every other parent refers to the current slice.
- class probability.MCLmap(m)[source]
Bases:
objectMap which provides probability distributions and sensor readings. Consists of discrete cells which are either an obstacle or empty
- probability.monte_carlo_localization(a, z, N, P_motion_sample, P_sensor, m, S=None)[source]
[Figure 25.9] Monte Carlo localization algorithm
Probability models (Chapter 12-13)
- class probability4e.ProbDist(varname='?', freqs=None)[source]
Bases:
objectA discrete probability distribution. You name the random variable in the constructor, then assign and query probability of values. >>> P = ProbDist(‘Flip’); P[‘H’], P[‘T’] = 0.25, 0.75; P[‘H’] 0.25 >>> P = ProbDist(‘X’, {‘lo’: 125, ‘med’: 375, ‘hi’: 500}) >>> P[‘lo’], P[‘med’], P[‘hi’] (0.125, 0.375, 0.5)
- class probability4e.JointProbDist(variables)[source]
Bases:
ProbDistA discrete probability distribute over a set of variables. >>> P = JointProbDist([‘X’, ‘Y’]); P[1, 1] = 0.25 >>> P[1, 1] 0.25 >>> P[dict(X=0, Y=1)] = 0.5 >>> P[dict(X=0, Y=1)] 0.5
- probability4e.event_values(event, variables)[source]
Return a tuple of the values of variables in event. >>> event_values ({‘A’: 10, ‘B’: 9, ‘C’: 8}, [‘C’, ‘A’]) (8, 10) >>> event_values ((1, 2), [‘C’, ‘A’]) (1, 2)
- probability4e.enumerate_joint_ask(X, e, P)[source]
Return a probability distribution over the values of the variable X, given the {var:val} observations e, in the JointProbDist P. [Section 12.3] >>> P = JointProbDist([‘X’, ‘Y’]) >>> P[0,0] = 0.25; P[0,1] = 0.5; P[1,1] = P[2,1] = 0.125 >>> enumerate_joint_ask(‘X’, dict(Y=1), P).show_approx() ‘0: 0.667, 1: 0.167, 2: 0.167’
- probability4e.enumerate_joint(variables, e, P)[source]
Return the sum of those entries in P consistent with e, provided variables is P’s remaining variables (the ones not in e).
- probability4e.is_independent(variables, P)[source]
Return whether a list of variables are independent given their distribution P P is an instance of JoinProbDist >>> P = JointProbDist([‘X’, ‘Y’]) >>> P[0,0] = 0.25; P[0,1] = 0.5; P[1,1] = P[1,0] = 0.125 >>> is_independent([‘X’, ‘Y’], P) False
- probability4e.gen_possible_events(vars, P)[source]
Generate all possible events of a collection of vars according to distribution of P
- class probability4e.BayesNet(node_specs=None)[source]
Bases:
objectBayesian network containing only boolean-variable nodes.
- add(node_spec)[source]
Add a node to the net. Its parents must already be in the net, and its variable must not. Initialize Bayes nodes by detecting the length of input node specs
- class probability4e.BayesNode(X, parents, cpt)[source]
Bases:
objectA conditional probability distribution for a boolean variable, P(X | parents). Part of a BayesNet.
- p(value, event)[source]
Return the conditional probability P(X=value | parents=parent_values), where parent_values are the values of parents in event. (event must assign each parent a value.) >>> bn = BayesNode(‘X’, ‘Burglary’, {T: 0.2, F: 0.625}) >>> bn.p(False, {‘Burglary’: False, ‘Earthquake’: True}) 0.375
- probability4e.gaussian_probability(param, event, value)[source]
Gaussian probability of a continuous Bayesian network node on condition of certain event and the parameters determined by the event
- Parameters:
param – parameters determined by discrete parent events of current node
event – a dict, continuous event of current node, the values are used as parameters in calculating distribution
value – float, the value of current continuous node
- Returns:
float, the calculated probability
>>> param = {'sigma':0.5, 'b':1, 'a':{'h1':0.5, 'h2': 1.5}} >>> event = {'h1':0.6, 'h2': 0.3} >>> gaussian_probability(param, event, 1) 0.2590351913317835
- probability4e.logistic_probability(param, event, value)[source]
Logistic probability of a discrete node in Bayesian network with continuous parents, :param param: a dict, parameters determined by discrete parents of current node :param event: a dict, names and values of continuous parent variables of current node :param value: boolean, True or False :return: int, probability
- class probability4e.ContinuousBayesNode(name, d_parents, c_parents, parameters, type)[source]
Bases:
objectA Bayesian network node with continuous distribution or with continuous distributed parents
- probability4e.enumeration_ask(X, e, bn)[source]
Return the conditional probability distribution of variable X given evidence e, from BayesNet bn. [Figure 13.10] >>> enumeration_ask(‘Burglary’, dict(JohnCalls=T, MaryCalls=T), burglary … ).show_approx() ‘False: 0.716, True: 0.284’
- probability4e.enumerate_all(variables, e, bn)[source]
Return the sum of those entries in P(variables | e{others}) consistent with e, where P is the joint distribution represented by bn, and e{others} means e restricted to bn’s other variables (the ones other than variables). Parents must precede children in variables.
- probability4e.elimination_ask(X, e, bn)[source]
Compute bn’s P(X|e) by variable elimination. [Figure 13.12] >>> elimination_ask(‘Burglary’, dict(JohnCalls=T, MaryCalls=T), burglary … ).show_approx() ‘False: 0.716, True: 0.284’
Is var a hidden variable when querying P(X|e)?
- probability4e.make_factor(var, e, bn)[source]
Return the factor for var in bn’s joint distribution given e. That is, bn’s full joint distribution, projected to accord with e, is the pointwise product of these factors for bn’s variables.
- probability4e.pointwise_product(factors, bn)[source]
Multiply a sequence of factors together into a single factor over the union of their variables, using the Bayes net
bnto enumerate variable values.
- probability4e.sum_out(var, factors, bn)[source]
Eliminate var from all factors by summing over its values.
- probability4e.all_events(variables, bn, e)[source]
Yield every way of extending e with values for all variables.
- probability4e.prior_sample(bn)[source]
Randomly sample from bn’s full joint distribution. The result is a {variable: value} dict. [Figure 13.15]
- probability4e.rejection_sampling(X, e, bn, N=10000)[source]
[Figure 13.16] Estimate the probability distribution of variable X given evidence e in BayesNet bn, using N samples. Raises a ZeroDivisionError if all the N samples are rejected, i.e., inconsistent with e. >>> random.seed(47) >>> rejection_sampling(‘Burglary’, dict(JohnCalls=T, MaryCalls=T), … burglary, 10000).show_approx() ‘False: 0.7, True: 0.3’
- probability4e.consistent_with(event, evidence)[source]
Is event consistent with the given evidence?
- probability4e.likelihood_weighting(X, e, bn, N=10000)[source]
[Figure 13.17] Estimate the probability distribution of variable X given evidence e in BayesNet bn. >>> random.seed(1017) >>> likelihood_weighting(‘Burglary’, dict(JohnCalls=T, MaryCalls=T), … burglary, 10000).show_approx() ‘False: 0.702, True: 0.298’
- probability4e.weighted_sample(bn, e)[source]
Sample an event from bn that’s consistent with the evidence e; return the event and its weight, the likelihood that the event accords to the evidence.
- probability4e.markov_blanket_sample(X, e, bn)[source]
Return a sample from P(X | mb) where mb denotes that the variables in the Markov blanket of X take their values from event e (which must assign a value to each). The Markov blanket of X is X’s parents, children, and children’s parents.
- class probability4e.complied_burglary[source]
Bases:
objectcompiled version of burglary network
- Burglary(sample)[source]
Return P(Burglary=True) conditioned on the Alarm and Earthquake values of the given sample, using precomputed (compiled) probabilities for its Markov blanket.
Markov Decision Processes (Chapter 17)
First we define an MDP, and the special case of a GridMDP, in which states are laid out in a 2-dimensional grid. We also represent a policy as a dictionary of {state: action} pairs, and a Utility function as a dictionary of {state: number} pairs. We then define the value_iteration and policy_iteration algorithms.
>>> pi = best_policy(sequential_decision_environment, value_iteration(sequential_decision_environment, .01))
>>> sequential_decision_environment.to_arrows(pi)
[['>', '>', '>', '.'], ['^', None, '^', '.'], ['^', '>', '^', '<']]
>>> from utils import print_table
>>> print_table(sequential_decision_environment.to_arrows(pi))
> > > .
^ None ^ .
^ > ^ <
>>> print_table(sequential_decision_environment.to_arrows(policy_iteration(sequential_decision_environment)))
> > > .
^ None ^ .
^ > ^ <
- class mdp.MDP(init, actlist, terminals, transitions=None, reward=None, states=None, gamma=0.9)[source]
Bases:
objectA Markov Decision Process, defined by an initial state, transition model, and reward function. We also keep track of a gamma value, for use by algorithms. The transition model is represented somewhat differently from the text. Instead of P(s’ | s, a) being a probability number for each state/state/action triplet, we instead have T(s, a) return a list of (p, s’) pairs. We also keep track of the possible states, terminal states, and actions for each state. [Page 646]
- T(state, action)[source]
Transition model. From a state and an action, return a list of (probability, result-state) pairs.
- actions(state)[source]
Return a list of actions that can be performed in this state. By default, a fixed list of actions, except for terminal states. Override this method if you need to specialize by state.
- class mdp.MDP2(init, actlist, terminals, transitions, reward=None, gamma=0.9)[source]
Bases:
MDPInherits from MDP. Handles terminal states, and transitions to and from terminal states better.
- class mdp.GridMDP(grid, terminals, init=(0, 0), gamma=0.9)[source]
Bases:
MDPA two-dimensional grid MDP, as in [Figure 17.1]. All you have to do is specify the grid as a list of lists of rewards; use None for an obstacle (unreachable state). Also, you should specify the terminal states. An action is an (x, y) unit vector; e.g. (1, 0) means move east.
- calculate_T(state, action)[source]
Return the (probability, next-state) outcomes for the intended action: 0.8 of moving as intended and 0.1 each of veering to the right or left.
- mdp.gen_grid(n_rows=3, n_cols=4, terminals=((3, 2), (3, 1)), main_reward=-0.04, terminal_rewards=(1, -1), block_coords=((1, 1),))[source]
Generate a grid (list of lists of rewards) of arbitrary size in the format accepted by GridMDP, e.g. GridMDP(gen_grid(…), terminals=[…]). n_rows, n_cols: grid dimensions. terminals: (x, y) coordinates of the terminal cells. main_reward: reward for every non-terminal, non-blocked cell. terminal_rewards: reward for each cell in terminals (paired by position). block_coords: (x, y) coordinates of obstacles (set to None / unreachable).
- mdp.best_policy(mdp, U)[source]
Given an MDP and a utility function U, determine the best policy, as a mapping from state to action. [Equation 17.4]
- mdp.expected_utility(a, s, U, mdp)[source]
The expected utility of doing a in state s, according to the MDP and U.
- mdp.policy_evaluation(pi, U, mdp, k=20)[source]
Return an updated utility mapping U from each state in the MDP to its utility, using an approximation (modified policy iteration).
- class mdp.POMDP(actions, transitions=None, evidences=None, rewards=None, states=None, gamma=0.95)[source]
Bases:
MDPA Partially Observable Markov Decision Process, defined by a transition model P(s’|s,a), actions A(s), a reward function R(s), and a sensor model P(e|s). We also keep track of a gamma value, for use by algorithms. The transition and the sensor models are defined as matrices. We also keep track of the possible states and actions for each state. [Page 659].
- remove_dominated_plans(input_values)[source]
Remove dominated plans. This method finds all the lines contributing to the upper surface and removes those which don’t.
Markov Decision Processes (Chapter 16)
First we define an MDP, and the special case of a GridMDP, in which states are laid out in a 2-dimensional grid. We also represent a policy as a dictionary of {state: action} pairs, and a Utility function as a dictionary of {state: number} pairs. We then define the value_iteration and policy_iteration algorithms.
>>> pi = best_policy(sequential_decision_environment, value_iteration(sequential_decision_environment, .01))
>>> sequential_decision_environment.to_arrows(pi)
[['>', '>', '>', '.'], ['^', None, '^', '.'], ['^', '>', '^', '<']]
>>> from utils import print_table
>>> print_table(sequential_decision_environment.to_arrows(pi))
> > > .
^ None ^ .
^ > ^ <
>>> print_table(sequential_decision_environment.to_arrows(policy_iteration(sequential_decision_environment)))
> > > .
^ None ^ .
^ > ^ <
- class mdp4e.MDP(init, actlist, terminals, transitions=None, reward=None, states=None, gamma=0.9)[source]
Bases:
objectA Markov Decision Process, defined by an initial state, transition model, and reward function. We also keep track of a gamma value, for use by algorithms. The transition model is represented somewhat differently from the text. Instead of P(s’ | s, a) being a probability number for each state/state/action triplet, we instead have T(s, a) return a list of (p, s’) pairs. We also keep track of the possible states, terminal states, and actions for each state. [Page 646]
- T(state, action)[source]
Transition model. From a state and an action, return a list of (probability, result-state) pairs.
- actions(state)[source]
Return a list of actions that can be performed in this state. By default, a fixed list of actions, except for terminal states. Override this method if you need to specialize by state.
- class mdp4e.MDP2(init, actlist, terminals, transitions, reward=None, gamma=0.9)[source]
Bases:
MDPInherits from MDP. Handles terminal states, and transitions to and from terminal states better.
- class mdp4e.GridMDP(grid, terminals, init=(0, 0), gamma=0.9)[source]
Bases:
MDPA two-dimensional grid MDP, as in [Figure 16.1]. All you have to do is specify the grid as a list of lists of rewards; use None for an obstacle (unreachable state). Also, you should specify the terminal states. An action is an (x, y) unit vector; e.g. (1, 0) means move east.
- calculate_T(state, action)[source]
Return the (probability, next-state) outcomes for the intended action: 0.8 of moving as intended and 0.1 each of veering to the right or left.
- mdp4e.q_value(mdp, s, a, U)[source]
Return the Q-value of taking action
ain statesgiven the utility estimatesU, i.e. the expected sum of the immediate reward and the discounted utility of the successor states. With no action it falls back to the reward ofs.
- mdp4e.best_policy(mdp, U)[source]
Given an MDP and a utility function U, determine the best policy, as a mapping from state to action.
- mdp4e.expected_utility(a, s, U, mdp)[source]
The expected utility of doing a in state s, according to the MDP and U.
- mdp4e.policy_evaluation(pi, U, mdp, k=20)[source]
Return an updated utility mapping U from each state in the MDP to its utility, using an approximation (modified policy iteration).
- class mdp4e.POMDP(actions, transitions=None, evidences=None, rewards=None, states=None, gamma=0.95)[source]
Bases:
MDPA Partially Observable Markov Decision Process, defined by a transition model P(s’|s,a), actions A(s), a reward function R(s), and a sensor model P(e|s). We also keep track of a gamma value, for use by algorithms. The transition and the sensor models are defined as matrices. We also keep track of the possible states and actions for each state. [Page 659].
- remove_dominated_plans(input_values)[source]
Remove dominated plans. This method finds all the lines contributing to the upper surface and removes those which don’t.
- mdp4e.update_belief(pomdp, belief, action, observation)[source]
[Equation 17.17] POMDP filtering: update the belief state (a probability distribution over the states) after executing ‘action’ and perceiving ‘observation’. The prediction step propagates the belief through the transition model and the update step weights it by the sensor model, then renormalizes:
b'(s') = alpha * P(o | s') * sum_s P(s' | s, a) b(s)
- mdp4e.pomdp_lookahead(pomdp, belief, depth)[source]
[Section 17.5 - Dynamic Decision Networks] Online decision making for a POMDP modeled as a dynamic decision network (DDN): the network is projected ‘depth’ steps into the future and solved by belief-state expectimax search. Decision nodes range over the belief states and chance nodes branch over the possible observations, with the belief updated by POMDP filtering ([Equation 17.17]) along each branch. Returns the action that maximizes the expected discounted utility from ‘belief’.
Making Simple Decisions (Chapter 15)
- class making_simple_decisions4e.DecisionNetwork(action, infer)[source]
Bases:
BayesNetAn abstract class for a decision network as a wrapper for a BayesNet. Represents an agent’s current state, its possible actions, reachable states and utilities of those states.
- class making_simple_decisions4e.InformationGatheringAgent(decnet, infer, initial_evidence=None)[source]
Bases:
AgentA simple information gathering agent. The agent works by repeatedly selecting the observation with the highest information value, until the cost of the next observation is greater than its expected benefit. [Figure 16.9]