Logistic Regression

#Unsupervised Learning #Statistical Learning #Basics #Linear Models #Supervised Learning #Classification

In a classification problem, given a list of features values $x$ and their corresponding classes $\{c_i\}$, the posterior for of the classes, aka conditional probability of the classes, is

$$ p(C=c_i\mid X=x). $$


The likelihood of the data is

$$ p(X=x\mid C=c_i). $$

Logistic Regression for Two Classes

For two classes, the simplest model for the posterior is a linear model,

$$ \log \frac{p(C=c_1\mid X=x) }{p(C=c_2\mid X=x)} = \beta_0 + \beta_1 \cdot x, $$

which is equivalent to

$$ p(C=c_1\mid X=x) = \exp\left(\beta_0 + \beta_1 \cdot x\right) p(C=c_2\mid X=x) . $$


The reason that we proposing a linear model for the quantity

$$ \log \frac{p(C=c_1\mid X=x) }{p(C=c_2\mid X=x)}, $$

is that it has a range from $-\infty$ to $\infty$ which matches the range of the linear model $ \beta_0 + \beta_1 \cdot x$.

We can also see in the following results that such relation guarantees that the conditional probabilities are restricted to 0 to 1 after applying the normalization constraint.

Using the normalization condition

$$ p(C=c_1\mid X=x) + p(C=c_2\mid X=x) = 1, $$

we can derive the posterior for each classes

$$ \begin{align} p(C=c_2\mid X=x) &= \frac{1}{1 + \exp\left(\beta_0 + \beta_1 \cdot x\right)} \\ p(C=c_1\mid X=x) &= \frac{\exp\left(\beta_0 + \beta_1 \cdot x\right)}{1 + \exp\left(\beta_0 + \beta_1 \cdot x\right)}. \end{align} $$

For simplicity, we are using $x'=\beta_0 + \beta_1 \cdot x$ in this figure.

The two conditional probabilities

For simplicity, we are using $x'=\beta_0 + \beta_1 \cdot x$ in this figure.

Limiting behavior

  1. As $\beta_0 + \beta_1 \cdot x \to \infty$, we have $p(C=c_2\mid X=x) \to 0$ and $p(C=c_1\mid X=x)\to 1$.
  2. As $\beta_0 + \beta_1 \cdot x \to 0$, we have $p(C=c_2\mid X=x) \to 0.5$ and $p(C=c_1\mid X=x)\to 0.5$.
  3. As $\beta_0 + \beta_1 \cdot x \to -\infty$, we have $p(C=c_2\mid X=x) \to 1$ and $p(C=c_1\mid X=x)\to 0$.

Logistic Regression for $K$ Classes

It is easily generalized to problems with $K$ classes.

$$ \begin{align} p(C=c_K\mid X=x) &= \frac{1}{1 + \sum_k\exp\left(\beta_{k0} + \beta_k \cdot x\right)} \\ p(C=c_k\mid X=x) &= \frac{\exp\left(\beta_{k0} + \beta_k \cdot x\right)}{1 + \sum_k\exp\left(\beta_{k0} + \beta_k \cdot x\right)} \end{align} $$

Why not non-linear

The log of the posterior ratio can be more complex than linear models. In general, we have1

$$ \log \frac{p(C=c_1\mid X=x) }{p(C=c_2\mid X=x)} = f(x), $$

so that

$$ p(C=c_1\mid X=x) = \frac{\exp(f(x))}{ 1 + \exp(f(x)) }. $$

The logistic regression model we mentioned in the previous sections require

$$ f(x) = \beta_0 + \beta_1 \cdot x. $$

A more general additive model is

$$ f(x) = \sum_i f_i(x), $$

where we can apply algorithms such as local scoring to fit such models1.

  1. friedman2000 Friedman J, Hastie T, Tibshirani R. Additive Logistic Regression. The Annals of Statistics. 2000. pp. 337–374. doi:10.1214/aos/1016218223 ↩︎

Published: by ;

Lei Ma (2021). 'Logistic Regression', Datumorphism, 05 April. Available at: https://datumorphism.leima.is/wiki/machine-learning/linear/logistic-regression/.

Current Ref:

  • wiki/machine-learning/linear/logistic-regression.md