ERM: Empirical Risk Minimization
#Data #Loss
Empirical risk $R$ is a measurement the goodness of fit based on empirical information. Empirical risk minimization minimizes the empirical risk to select a good model $\hat f$ out of all possible models $f$ in our hypothesis space for a dataset $\mathcal D$,
$$ \hat f = \operatorname{argmin} R(f, \mathcal D). $$Empirical Risk Example
For example, the emprical risk can be represented by the negative log likelihood.
A negative log likelihood (NLL) for a model $\theta$ of dataset $\mathcal D$
$$ NLL(\theta) = -\log p(\mathcal D\mid\theta) = -\sum_n \log (y_n \mid x_n, \theta). $$An empirical risk loss function $\mathcal L$ is
$$ \mathcal L(\theta) = \frac{1}{N} \sum_n \mathscr l(y_n, \theta; x_n), $$where $\mathscr l$ is a loss. For example, one could design a stepwise loss in classification
$$ \mathscr l = \begin{cases} 0, \qquad \text{if prediction matches data} \\ 1 \qquad \text{if prediction doesn't match data} \end{cases} $$Another possibility is surrogate loss which is continuous.
Regularized Risk
However, ERM may lead to overfitting. One method to solve this is to add a penalty based on the complexity of the model $C(f)$,
$$ R_{Reg}(f, \mathcal D) = R(f, \mathcal D) + \lambda C(f). $$Table of Contents
Current Ref:
- cards/machine-learning/learning-theories/empirical-risk-minimization.md