Linear Regression
A Model
A model is an estimator of the data that maps the inputs $\mathbf X$ to the predicted outputs $\hat{\mathbf Y}$, $\hat{\mathbf Y} = F( \mathbf X )$. The map $F$ might require some parameters, ${\boldsymbol\alpha, \boldsymbol\beta, \cdots }$.
Then we should have an estimator that tells us how good the model is given some parameters. For example, we could define a loss function $L(\mathbf Y,\hat{\mathbf Y} )$ that estimates the deficit between the actual data and the predicted results. Then we minimize this deficit.
So a model usually has
- map,
- esitmator.
Linear Model
The model is simple
$$ \hat Y_i = X_{ij}\beta_ j + \beta_0 $$
One might want to create a augmented dataset by including a 0th column $X_{i0} = 1$, so that we have $$ \hat Y_i = X_{ij}\beta_ j $$ with $j=0,1,2,…$.
Using least squares as our estimator, we minimize the RSS loss function $L$ by choosing the suitable parameters $\beta_j$. The loss function is
$$ L = ( Y_i - X_{ij}\beta_j )( Y_i - X_{ik}\beta_k ). $$
Minimizing it ‘requires’ $\partial_{\beta_m} L = 0$ and $\partial_{\beta_m}\partial_{\beta_n} L > 0 $.
We have
$$ \begin{align} \partial_{\beta_m} L =& (\partial_{\beta_m} ( Y_i - X_{ij}\beta_j ) ) ( Y_i - X_{ik}\beta_k ) + ( Y_i - X_{ij}\beta_j ) \partial_{\beta_m} ( Y_i - X_{ik}\beta_k ) \\ =& - X_{ij} \delta_{jm}( Y_i - X_{ik}\beta_k ) + ( Y_i - X_{ij}\beta_j ) ( - X_{ik}\delta_{km} ) \\ =& - 2 X_{im} ( Y_i - X_{ij}\beta_j ) \end{align} $$
Solving $- 2 X_{im} ( Y_i - X_{ij}\beta_j ) = 0$, we have
$$ \begin{align} & 0 = X_{im} ( Y_i - X_{ij}\beta_j ) \\ & X_{im} X_{ij}\beta_j = X_{im} Y_i \end{align} $$
Note that it is multiplication and summation on the left since we are using Einstein summation convention. This is better understood in the abstract matrix notation,
$$ \boldsymbol \beta = ( \mathbf X^T \mathbf X )^{-1} \mathbf X^T \mathbf Y. $$
Scalar $\mathbf X_i$
For scalar $\mathbf X_i = x_i$, we can replace $\mathbf X$ with a vector $\vec x$ for all data points. we have
$$ \beta = \frac{x_j y_j}{x_i x_i}. $$
More generally, least square leads to the following results1
$$ \begin{align} k &= \frac{\sum_j (x_j - \bar x) y_j}{\sum_i (x_i - \bar x)(x_i - \bar x)} \\ b &= \bar y - k \bar x, \end{align} $$
where $\bar x$ and $\bar y$ are the mean values. Then the model
$$ y = k x + b $$
can be obtained.
L Ma (2019). 'Linear Regression', Datumorphism, 01 April. Available at: https://datumorphism.leima.is/wiki/statistics/linear-regression/.