In a , empirical risk $R$ is a measurement the goodness of fit based on empirical information. Empirical risk minimization minimizes the empirical risk to select a good model $\hat f$ out of all possible models $f$ in our hypothesis space for a dataset $\mathcal D$,

$$\hat f = \operatorname{argmin} R(f, \mathcal D).$$

## Empirical Risk Example

For example, the emprical risk can be represented by the negative log likelihood.

A negative log likelihood (NLL) for a model $\theta$ of dataset $\mathcal D$

$$NLL(\theta) = -\log p(\mathcal D\mid\theta) = -\sum_n \log (y_n \mid x_n, \theta).$$

An empirical risk loss function $\mathcal L$ is

$$\mathcal L(\theta) = \frac{1}{N} \sum_n \mathscr l(y_n, \theta; x_n),$$

where $\mathscr l$ is a loss. For example, one could design a stepwise loss in classification

$$\mathscr l = \begin{cases} 0, \qquad \text{if prediction matches data} \\ 1 \qquad \text{if prediction doesn't match data} \end{cases}$$

Another possibility is surrogate loss which is continuous.

## Regularized Risk

However, ERM may lead to overfitting. One method to solve this is to add a penalty based on the complexity of the model $C(f)$,

$$R_{Reg}(f, \mathcal D) = R(f, \mathcal D) + \lambda C(f).$$

Planted: by ;

cards/machine-learning/learning-theories/empirical-risk-minimization Links to:

L Ma (2021). 'ERM: Empirical Risk Minimization', Datumorphism, 02 April. Available at: https://datumorphism.leima.is/cards/machine-learning/learning-theories/empirical-risk-minimization/.