# Likelihood

## #Statistics #Likelihood #Bayes

For some data points $\{x_i\}$ and a model $\theta$, the likelihood of our data point $x_i$ is $p(x_i\mid \theta)$. To be more specific, the likelihood of all data points is a function of the model $\theta$,

$$ L(\theta) = \Pi_i p(x_i\mid\theta). $$

It should be mentioned that this likelihood is not necessarily a pdf. As an example, we can calculate the likelihood of a Bernoulli distribution for a single event $x$,

$$ L(\theta) = \theta^x (1-\theta)^{(1-x)}. $$

If we are flipping coins, and the head $x=1$ probability is $\theta$, the likelihood for this single event $x=1$ is

$$ L(\theta)=\theta. $$

For two events with $x_1=1$ and $x_2=0$, the likelihood is

$$ L(\theta) = \theta (1 - \theta). $$

There is no guarantee that this is a pdf.

## Maximum likelihood

How do we find out the value of the parameters? One popular method is MLE, aka to maximize the likelihood, i.e., maximizing the probability of the data by choosing a proper model. In the second example, we find the derivatives

$$ \partial_\theta L(\theta) = 1 - 2\theta. $$

The maximum value is reached when $\theta=\hat\theta=1/2$.

However, if we have two experiments with $x_1=1$ and $x_2=1$, the likelihood becomes

$$ L(\theta) = \theta\theta. $$

MLE shows that $\theta=\hat\theta=0$.

If we have two experiments with $x_1=0$ and $x_2=0$, the likelihood becomes

$$ L(\theta) = (1-\theta)(1-\theta). $$

MLE shows that $\theta=\hat\theta=1$.

Lei Ma (2021). 'Likelihood', Datumorphism, 05 April. Available at: https://datumorphism.leima.is/cards/statistics/likelihood/.

## Table of Contents

**Current Ref:**

- cards/statistics/likelihood.md

**Links from:**

###### MaxEnt Model

Maximum Entropy models makes least assumption about the data

###### Naive Bayes

Naive Bayes

###### Goodness-of-fit

Does the data agree with the model? Calculate the distance between data and model predictions. …

###### Bayesian Linear Regression

Bayesian Linear Regression

###### Logistic Regression

logistics regression is a simple model for classification

###### Boltzmann Machine

Boltzmann machine is much like a spin glass model in physics. In short words, Boltzmann machine is a …

###### Fisher Information

Fisher information measures the second moment of the model sensitivity with respect to the …

###### ERM: Empirical Risk Minimization

In a learning problem The learning problem posed by Vapnik:1 Given a sample: $\{z_i\}$ in the …

###### Latent Variable Models

Latent variable models brings us new insights on identifying the patterns of some sample data.

###### Akaike Information Criterion

Suppose we have a model that describes the data generation process behind a dataset. The …

###### Minimum Description Length

MDL is a measure of how well a model compresses data by minimizing the combined cost of the …

###### Normalized Maximum Likelihood

$$ \mathrm{NML} = \frac{ p(y| \hat \theta(y)) }{ \int_X p( x| \hat \theta (x) ) dx } $$

###### Bayes' Theorem

Bayes’ Theorem is stated as $$ P(A\mid B) = \frac{P(B \mid A) P(A)}{P(B)} $$ $P(A\mid B)$: …