Bayesian Information Criterion
#Bayesian #Model Selection
BIC is Bayesian information criterion, it replaced the $+2k$ term in AIC Akaike Information Criterion Suppose we have a model that describes the data generation process behind a dataset. The distribution by the model is denoted as $\hat f$. The actual data generation process is described by a distribution $f$. We ask the question: How good is the approximation using $\hat f$? To be more precise, how much information is lost if we use our model dist $\hat f$ to substitute the actual data generation distribution $f$? AIC defines this information loss as $$ \mathrm{AIC} =  2 \ln p(y\hat\theta) + … with $k\ln n$ to bring in punishment for the number of parameters of the model based on the number of data records,
$$ \mathrm{BIC} = 2\ln p(y\hat\theta) + k\ln n = \ln \left(\frac{n^k}{p^2}\right) $$
 $n$ is the observations.
We prefer the model with a small BIC.
L Ma (2020). 'Bayesian Information Criterion', Datumorphism, 11 April. Available at: https://datumorphism.leima.is/cards/statistics/bic/.
Current Ref:

cards/statistics/bic.md