# Linear Regression

## A Model

A model is an estimator of the data that maps the inputs $\mathbf X$ to the predicted outputs $\hat{\mathbf Y}$, $\hat{\mathbf Y} = F( \mathbf X )$. The map $F$ might require some parameters, ${\boldsymbol\alpha, \boldsymbol\beta, \cdots }$.

Then we should have an estimator that tells us how good the model is given some parameters. For example, we could define a loss function $L(\mathbf Y,\hat{\mathbf Y} )$ that estimates the deficit between the actual data and the predicted results. Then we minimize this deficit.

So a model usually has

- map,
- esitmator.

## Linear Model

The model is simple

$$ \hat Y_i = X_{ij}\beta_ j + \beta_0 $$

One might want to create a augmented dataset by including a 0th column $X_{i0} = 1$, so that we have $$ \hat Y_i = X_{ij}\beta_ j $$ with $j=0,1,2,…$.

Using least squares as our estimator, we minimize the RSS loss function $L$ by choosing the suitable parameters $\beta_j$. The loss function is

$$ L = ( Y_i - X_{ij}\beta_j )( Y_i - X_{ik}\beta_k ). $$

Minimizing it ‘requires’ $\partial_{\beta_m} L = 0$ and $\partial_{\beta_m}\partial_{\beta_n} L > 0 $.

We have

$$ \begin{align} \partial_{\beta_m} L =& (\partial_{\beta_m} ( Y_i - X_{ij}\beta_j ) ) ( Y_i - X_{ik}\beta_k ) + ( Y_i - X_{ij}\beta_j ) \partial_{\beta_m} ( Y_i - X_{ik}\beta_k ) \\ =& - X_{ij} \delta_{jm}( Y_i - X_{ik}\beta_k ) + ( Y_i - X_{ij}\beta_j ) ( - X_{ik}\delta_{km} ) \\ =& - 2 X_{im} ( Y_i - X_{ij}\beta_j ) \end{align} $$

Solving $- 2 X_{im} ( Y_i - X_{ij}\beta_j ) = 0$, we have

$$ \begin{align} & 0 = X_{im} ( Y_i - X_{ij}\beta_j ) \\ & X_{im} X_{ij}\beta_j = X_{im} Y_i \end{align} $$

Note that it is multiplication and summation on the left since we are using Einstein summation convention. This is better understood in the abstract matrix notation,

$$ \boldsymbol \beta = ( \mathbf X^T \mathbf X )^{-1} \mathbf X^T \mathbf Y. $$

## Scalar $\mathbf X_i$

For scalar $\mathbf X_i = x_i$, we can replace $\mathbf X$ with a vector $\vec x$ for all data points. we have

$$ \beta = \frac{x_j y_j}{x_i x_i}. $$

More generally, least square leads to the following results^{1}

$$ \begin{align} k &= \frac{\sum_j (x_j - \bar x) y_j}{\sum_i (x_i - \bar x)(x_i - \bar x)} \\ b &= \bar y - k \bar x, \end{align} $$

where $\bar x$ and $\bar y$ are the mean values. Then the model

$$ y = k x + b $$

can be obtained.

L Ma (2019). 'Linear Regression', Datumorphism, 01 April. Available at: https://datumorphism.leima.is/wiki/statistics/linear-regression/.