Does the data agree with the model?

  • Calculate the distance between data and model predictions.
  • Apply Bayesian methods such as likelihood estimation: likelihood of observing the data if we assume the model; the results will be a set of fitting parameters.

Why don’t we always use goodness-of-fit as a measure of the goodness of a model?

  • We may experience overfitting.
  • The model may not be intuitive.

This is why we would like to balance it with parsimony using some measures of generalizability.

K-means and overfitting

The overfitting problem is easily demonstrated using the K-means model.

Suppose we use $k=1$, i.e., considering only the data point and no neighbors, we will get a model that is 100% agreeing with the data. If we require only goodness of fit, we may as well choose this $k=1$ model. However, such a model is useless since it is the dataset itself without any other insights.

Planted: by ;

L Ma (2020). 'Goodness-of-fit', Datumorphism, 11 April. Available at: