Measures of Generalizability

To measure the generalization, we define a generalization error,

$$ \begin{align} \mathcal G = \mathcal L_{P}(\hat f) - \mathcal L_E(\hat f), \end{align} $$

where $\mathcal L_{P}$ is the population loss, $\mathcal L_E$ is the empirical loss, and $\hat f$ is our model by minimizing the empirical loss.

However, we do not know the actual joint probability $p(x, y)$ of our dataset $\{x_i, y_i\}$. Thus the population loss is not known. In machine learning, we usually use [[cross validation]] Cross Validation Cross validation is a method to estimate the [[risk]] The Learning Problem The learning problem proposed by Vapnik:1 Given a sample: $\{z_i\}$ in the probability space $Z$; Assuming a probability measure on the probability space $Z$; Assuming a set of functions $Q(z, \alpha)$ (e.g. loss functions), where $\alpha$ is a set of parameters; A risk functional to be minimized by tunning “the handles” $\alpha$, $R(\alpha)$. The risk functional is $$ R(\alpha) = \int Q(z, \alpha) \,\mathrm d … where we split our dataset into train and test dataset. We approximate the population loss using the test dataset.

Planted: 2020-11-08 by L Ma;

References:

Dynamic Backlinks to wiki/model-selection/measures-of-generalizability:

Model Selection

Suppose we have a generating process that generates some numbers based on a distribution. Based on a …

Goodness-of-fit

Does the data agree with the model? Calculate the distance between data and model predictions. Apply …

Parsimony of Models

For models with a lot of parameters, the goodness-of-fit is very likely to be very high. However, it …

MDL and Neural Networks

Minimum Description Length ( [[MDL]] Minimum Description Length MDL is a measure of how well a model …

Empirical Loss

The loss calculated on all the data points

Population Loss

The loss calculated on all the whole population

wiki/model-selection/measures-of-generalizability Links to:

Empirical Loss

The loss calculated on all the data points

Population Loss

The loss calculated on all the whole population

Cross Validation

Cross validation is a method to estimate the [[risk]] The Learning Problem The learning problem …

L Ma (2020). 'Measures of Generalizability', Datumorphism, 11 April. Available at: https://datumorphism.leima.is/wiki/model-selection/measures-of-generalizability/.