# 5MDL and Neural Networks

Published:
Category: { Model Selection }
Summary: Minimum Description Length ( MDL MDL is a measure of how well a model compresses data by minimizing the combined cost of the description of the model and the misfit. ) can be used to construct a concise network. A fully connected network has great expressing power but it is easily overfitting. One strategy is to apply constraints to the networks: Limit the connections; Shared weights in subgroups of the network; Constrain the weights using some probability distributions.
Pages: 5

# 4Parsimony of Models

Published:
References: - Vandekerckhove, J., & Matzke, D. (2015). Model comparison and the principle of parsimony. Oxford Library of Psychology.
Summary: For models with a lot of parameters, the goodness-of-fit is very likely to be very high. However, it is also likely to generalize bad. So we need measure of generalizability Here parsinomy gives us a few advantages. easy to perceive better generalizations
Pages: 5

# 3Measures of Generalizability

Published:
Category: { Model Selection }
Summary: To measure the generalization, we define a generalization error, \begin{align} \mathcal G = \mathcal L_{P}(\hat f) - \mathcal L_E(\hat f), \end{align} where $\mathcal L_{P}$ is the population loss, $\mathcal L_E$ is the empirical loss, and $\hat f$ is our model by minimizing the empirical loss. However, we do not know the actual joint probability $p(x, y)$ of our dataset $\{x_i, y_i\}$. Thus the population loss is not known. In machine learning, we usually use cross validation where we split our dataset into train and test dataset.
Pages: 5

# 2Goodness-of-fit

Published:
Category: { Model Selection }
Summary: Does the data agree with the model? Calculate the distance between data and model predictions. Apply Bayesian methods such as likelihood estimation: likelihood of observing the data if we assume the model; the results will be a set of fitting parameters. … Why don’t we always use goodness-of-fit as a measure of the goodness of a model? We may experience overfitting. The model may not be intuitive. This is why we would like to balance it with parsimony using some measures of generalizability.
Pages: 5

# 1Model Selection

Published:
Category: { Model Selection }
Tags:
Summary: Suppose we have a generating process that generates some numbers based on a distribution. Based on a data sample, we could reconstruct some sort of theoretical models to represent the actual generating process. Which is a Good Model? (1)The black curve represent the generating process. The red rectangle is a very simple model that captures some major samples. The blue step-wise model is capturing more sample data but with more parameters.
Pages: 5