Bayesian Information Criterion

#Bayesian #Model Selection

BIC is Bayesian information criterion, it replaced the $+2k$ term in [[AIC]] Akaike Information Criterion Suppose we have a model that describes the data generation process behind a dataset. The distribution by the model is denoted as $\hat f$. The actual data generation process is described by a distribution $f$. We ask the question: How good is the approximation using $\hat f$? To be more precise, how much information is lost if we use our model dist $\hat f$ to substitute the actual data generation distribution $f$? AIC defines this information loss as $$ \mathrm{AIC} = - 2 \ln p(y|\hat\theta) + … with $k\ln n$ to bring in punishment for the number of parameters of the model based on the number of data records,

$$ \mathrm{BIC} = -2\ln p(y|\hat\theta) + k\ln n = \ln \left(\frac{n^k}{p^2}\right) $$