Cross Validation
Cross validation is a method to estimate the [[risk]] The Learning Problem The learning problem proposed by Vapnik:1 Given a sample: $\{z_i\}$ in the probability space $Z$; Assuming a probability measure on the probability space $Z$; Assuming a set of functions $Q(z, \alpha)$ (e.g. loss functions), where $\alpha$ is a set of parameters; A risk functional to be minimized by tunning “the handles” $\alpha$, $R(\alpha)$. The risk functional is $$ R(\alpha) = \int Q(z, \alpha) \,\mathrm d F(z). $$ A learning problem is the minimization of this risk. Vapnik2000 … .
To perform cross validation, we split the train dataset $\mathcal D$ into $k$ folds, with each fold denoted as $\mathcal D_k$.
Given a model $\mathcal M(x, \theta)$ with parameter $\theta$, there are two steps in the modelling procedure:
- Fitting
- where the estimator estimates the parameters $\hat \theta$;
- The fitting step can be denoted as $\hat\theta = \mathcal F(\mathcal D, \mathcal M)$
- Prediction
- where the estimated parameters are fed into the model to get the predictions $\mathcal M(\hat\theta)$;
- The prediction step can be denoted as $\hat y = \mathcal M (x, \hat\theta)$.
For a $k$th fold, we perform fitting on the datasets $\mathcal D_{\sim k}$ where ${}_{\sim k}$ means all datasets that are not the $k$th fold, the perform prediction using the $k$th dataset $\mathcal D_k$. The risk can be estimated as
$$ \begin{align} R_k =& \frac{1}{\lvert D_k \rvert}\sum_{i\in \mathcal D_k} L (y_i, \hat y ) \\ =& \frac{1}{\lvert D_k \rvert}\sum_{i\in \mathcal D_k} L (y_i,\mathcal M (x_i, \hat\theta_{\sim k}) ) \\ =& \frac{1}{\lvert D_k \rvert} \sum_{i\in \mathcal D_k} L (y_i,\mathcal M (x_i, \mathcal F(\mathcal D_{\sim k}, \mathcal M) ) ). \end{align} $$
The overall $K$-fold cross validation risk $R$ is the sum of all the risks $R_k$,
$$ \begin{align} R = \sum_{k=1}^K R_k \end{align} $$
If we have $\lvert \mathcal D_k \rvert = K$, we will have only one sample in the prediction step. This is called leave one out cross validation, aka LOOCV.
cards/machine-learning/learning-theories/cross-validation
:cards/machine-learning/learning-theories/cross-validation
Links to:L Ma (2021). 'Cross Validation', Datumorphism, 05 April. Available at: https://datumorphism.leima.is/cards/machine-learning/learning-theories/cross-validation/.