Likelihood

#Statistics #Likelihood #Bayes

For some data points $\{x_i\}$ and a model $\theta$, the likelihood of our data point $x_i$ is $p(x_i\mid \theta)$. To be more specific, the likelihood of all data points is a function of the model $\theta$,

$$ L(\theta) = \Pi_i p(x_i\mid\theta). $$

It should be mentioned that this likelihood is not necessarily a pdf. As an example, we can calculate the likelihood of a Bernoulli distribution for a single event $x$,

$$ L(\theta) = \theta^x (1-\theta)^{(1-x)}. $$

If we are flipping coins, and the head $x=1$ probability is $\theta$, the likelihood for this single event $x=1$ is

$$ L(\theta)=\theta. $$

For two events with $x_1=1$ and $x_2=0$, the likelihood is

$$ L(\theta) = \theta (1 - \theta). $$

There is no guarantee that this is a pdf.

Maximum likelihood

How do we find out the value of the parameters? One popular method is MLE, aka to maximize the likelihood, i.e., maximizing the probability of the data by choosing a proper model. In the second example, we find the derivatives

$$ \partial_\theta L(\theta) = 1 - 2\theta. $$

The maximum value is reached when $\theta=\hat\theta=1/2$.

However, if we have two experiments with $x_1=1$ and $x_2=1$, the likelihood becomes

$$ L(\theta) = \theta\theta. $$

MLE shows that $\theta=\hat\theta=0$.

If we have two experiments with $x_1=0$ and $x_2=0$, the likelihood becomes

$$ L(\theta) = (1-\theta)(1-\theta). $$

MLE shows that $\theta=\hat\theta=1$.