1. Write down an expression for the likelihood of $D$ (i.e., the probability of seeing this particular sequence of examples, given a fixed value of $\pi$) in terms of $\pi$, $p$, and $n$.

2. By differentiating the log likelihood $L$, find the value of $\pi$ that maximizes the likelihood.

3. Now suppose we add in $k$ Boolean random variables $X_1, X_2,\ldots,X_k$ (the “attributes”) that describe each sample, and suppose we assume that the attributes are conditionally independent of each other given the goal $Y$. Draw the Bayes net corresponding to this assumption.

4. Write down the likelihood for the data including the attributes, using the following additional notation:

- $\alpha_i$ is $P(X_i=true \| Y=true)$.

- $\beta_i$ is $P(X_i=true \| Y=false)$.

- $p_i^+$ is the count of samples for which $X_i=true$ and $Y=true$.

- $n_i^+$ is the count of samples for which $X_i=false$ and $Y=true$.

- $p_i^-$ is the count of samples for which $X_i=true$ and $Y=false$.

- $n_i^-$ is the count of samples for which $X_i=false$ and $Y=false$.

\[

*Hint*: consider first the probability of seeing a single example with specified values for $X_1, X_2,\ldots,X_k$ and $Y$.\]

5. By differentiating the log likelihood $L$, find the values of $\alpha_i$ and $\beta_i$ (in terms of the various counts) that maximize the likelihood and say in words what these values represent.

6. Let $k = 2$, and consider a data set with 4 all four possible examples of thexor function. Compute the maximum likelihood estimates of $\pi$, $\alpha_1$, $\alpha_2$, $\beta_1$, and $\beta_2$.

7. Given these estimates of $\pi$, $\alpha_1$, $\alpha_2$, $\beta_1$, and $\beta_2$, what are the posterior probabilities $P(Y=true | x_1,x_2)$ for each example?

Consider a single Boolean random variable $Y$ (the “classification”).
Let the prior probability $P(Y=true)$ be $\pi$. Let’s try to
find $\pi$, given a training set $D=(y_1,\ldots,y_N)$ with $N$
independent samples of $Y$. Furthermore, suppose $p$ of the $N$ are
positive and $n$ of the $N$ are negative.

1. Write down an expression for the likelihood of $D$ (i.e., the
probability of seeing this particular sequence of examples, given a
fixed value of $\pi$) in terms of $\pi$, $p$, and $n$.

2. By differentiating the log likelihood $L$, find the value of $\pi$
that maximizes the likelihood.

3. Now suppose we add in $k$ Boolean random variables
$X_1, X_2,\ldots,X_k$ (the “attributes”) that describe each sample,
and suppose we assume that the attributes are conditionally
independent of each other given the goal $Y$. Draw the Bayes net
corresponding to this assumption.

4. Write down the likelihood for the data including the attributes,
using the following additional notation:

- $\alpha_i$ is $P(X_i=true \| Y=true)$.

- $\beta_i$ is $P(X_i=true \| Y=false)$.

- $p_i^+$ is the count of samples for which $X_i=true$
and $Y=true$.

- $n_i^+$ is the count of samples for which $X_i=false$
and $Y=true$.

- $p_i^-$ is the count of samples for which $X_i=true$
and $Y=false$.

- $n_i^-$ is the count of samples for which $X_i=false$
and $Y=false$.

\[*Hint*: consider first the probability of seeing a
single example with specified values for $X_1, X_2,\ldots,X_k$ and
$Y$.\]

5. By differentiating the log likelihood $L$, find the values of
$\alpha_i$ and $\beta_i$ (in terms of the various counts) that
maximize the likelihood and say in words what these
values represent.

6. Let $k = 2$, and consider a data set with 4 all four possible
examples of thexor function. Compute the maximum
likelihood estimates of $\pi$, $\alpha_1$, $\alpha_2$, $\beta_1$,
and $\beta_2$.

7. Given these estimates of $\pi$, $\alpha_1$, $\alpha_2$, $\beta_1$,
and $\beta_2$, what are the posterior probabilities
$P(Y=true | x_1,x_2)$ for each example?