ERM: Empirical Risk Minimization

In a [[learning problem]] The Learning Problem The learning problem posed by Vapnik:1 Given a sample: $\{z_i\}$ in the probability space $Z$; Assuming a probability measure on the probability space $Z$; Assuming a set of functions $Q(z, \alpha)$ (e.g. loss functions), where $\alpha$ is a set of parameters; A risk functional to be minimized by tunning “the handles” $\alpha$, $R(\alpha)$. The risk functional is $$ R(\alpha) = \int Q(z, \alpha) \,\mathrm d F(z). $$ A learning problem is the minimization of this risk. Vapnik2000 … , empirical risk $R$ is a measurement the goodness of fit based on empirical information. Empirical risk minimization minimizes the empirical risk to select a good model $\hat f$ out of all possible models $f$ in our hypothesis space for a dataset $\mathcal D$,

$$ \hat f = \operatorname{argmin} R(f, \mathcal D). $$

Empirical Risk Example

For example, the emprical risk can be represented by the negative log likelihood.

A negative log likelihood (NLL) for a model $\theta$ of dataset $\mathcal D$

$$ NLL(\theta) = -\log p(\mathcal D\mid\theta) = -\sum_n \log (y_n \mid x_n, \theta). $$

An empirical risk loss function $\mathcal L$ is

$$ \mathcal L(\theta) = \frac{1}{N} \sum_n \mathscr l(y_n, \theta; x_n), $$

where $\mathscr l$ is a loss. For example, one could design a stepwise loss in classification

$$ \mathscr l = \begin{cases} 0, \qquad \text{if prediction matches data} \\ 1 \qquad \text{if prediction doesn't match data} \end{cases} $$

Another possibility is surrogate loss which is continuous.

Regularized Risk

However, ERM may lead to overfitting. One method to solve this is to add a penalty based on the complexity of the model $C(f)$,

$$ R_{Reg}(f, \mathcal D) = R(f, \mathcal D) + \lambda C(f). $$

Planted: by ;

L Ma (2021). 'ERM: Empirical Risk Minimization', Datumorphism, 02 April. Available at: https://datumorphism.leima.is/cards/machine-learning/learning-theories/empirical-risk-minimization/.