Statistics

Knowledge cards about statistics

Conditional Probability Table

Published:
Category: { Math }
Summary: The conditional probability table is also called CPT

Gamma Distribution

Published:
Category: { Statistics }
Summary: Gamma Distribution PDF: $$ \frac{\beta^\alpha x^{\alpha-1} e^{-\beta x}}{\Gamma(\alpha)} $$ Visualize

Cauchy-Lorentz Distribution

Published:
Category: { Statistics }
Summary: Cauchy-Lorentz Distribution .. ratio of two independent normally distributed random variables with mean zero. Source: https://en.wikipedia.org/wiki/Cauchy_distribution Lorentz distribution is frequently used in physics. PDF: $$ \frac{1}{\pi\gamma} \left( \frac{\gamma^2}{ (x-x_0)^2 + \gamma^2} \right) $$ The median and mode of the Cauchy-Lorentz distribution is always $x_0$. $\gamma$ is the FWHM. Visualize

Binomial Distribution

Published:
Category: { Statistics }
Summary: The number of successes in $n$ independent events where each trial has a success rate of $p$. PMF: $$ C_n^k p^k (1-p)^{n-k} $$

Beta Distribution

Published:
Category: { Statistics }
Summary: Beta Distribution Interact {% include extras/vue.html %} ((makeGraph))

Bernoulli Distribution

Published:
Category: { Statistics }
Summary: Two categories with probability $p$ and $1-p$ respectively.

Arcsine Distribution

Published:
Category: { Statistics }
Summary: Arcsine Distribution The PDF is $$ \frac{1}{\pi\sqrt{x(1-x)}} $$ for $x\in [0,1]$. It can also be generalized to $$ \frac{1}{\pi\sqrt{(x-1)(b-x)}} $$ for $x\in [a,b]$. Visualize

Covariance Matrix

Published:
Category: { Math }
Summary: Also known as the second central moment is a measurement of the spread.

Jackknife Resampling

Published:
Category: { Statistics }
References: - Jackknife Resampling
Summary: Jackknife resampling method

Kendall Tau Correlation

Published:
Category: { Statistics }
Summary: Definition two series of data: $X$ and $Y$ cooccurance of them: $(x_i, x_j)$, and we assume that $i<j$ concordant: $x_i < x_j$ and $y_i < y_j$; $x_i > x_j$ and $y_i > y_j$; denoted as $C$ discordant: $x_i < x_j$ and $y_i > y_j$; $x_i > x_j$ and $y_i < y_j$; denoted as $D$ neither concordant nor discordant: whenever equal sign happens Kendall’s tau is defined as $$ \begin{equation} \tau = \frac{C- D}{\text{all possible pairs of comparison}} = \frac{C- D}{n^2/2 - n/2} \end{equation} $$

Poisson Process

Published:
Category: { Statistics }
Summary: Poisson Process Statistics // define getUnixTime Date.prototype.getUnixTime = function () { return this.getTime() / 1000 | 0 }; if (!Date.now) Date.now = function () { return new Date(); } Date.time = function () { return Date.now().getUnixTime(); } POISSON_EVENT_RATE = 1 function get_event_time() { var time = new Date(); return time } all_event = [] all_event_diff = [] var data = [{ x: [get_event_time], y: [1], mode: 'markers', line: { color: '#80CAF6' } }] var layout = { title: { text: 'Poisson Process' }, xaxis: { title: { text: 'Event Time' }, } }; var layout_rate = { title: { text: 'Average Rate of the Poisson Process' }, xaxis: { title: { text: 'Event Time' }, }, yaxis: { title: { text: 'Average Event Rate per Second' }, rangemode: 'tozero' } }; var data_rate = [{ x: [get_event_time], y: [POISSON_EVENT_RATE], mode: 'lines+markers', line: { color: '#80CAF6' } }] Plotly.

Bayes' Theorem

Published:
Category: { Math }
Summary: Bayes’ Theorem is stated as $$ P(A\mid B) = \frac{P(B \mid A) P(A)}{P(B)} $$ $P(A\mid B)$: likelihood of A given B $P(A)$: marginal probability of A There is a nice tree diagram for the Bayes’ theorem on Wikipedia. Tree diagram of Bayes’ theorem

Normalized Maximum Likelihood

Published:
Tags:
Summary: $$ \mathrm{NML} = \frac{ p(y| \hat \theta(y)) }{ \int_X p( x| \hat \theta (x) ) dx } $$

Minimum Description Length

Published:
Tags:
References: - Vandekerckhove, J., & Matzke, D. (2015). Model comparison and the principle of parsimony. Oxford Library of Psychology.
Summary: The minimum description length ( #MDL ) is based on the idea of compression of the data. MDL looks for the model that compresses the data well. To compress data, we need to find the regularity in the data. There are many versions of MDL. crude two-part code Fisher information approximation ( # FIA ) Normalized Maximum likelihood ( #NML )

Kolmogorov Complexity

Published:
Summary: Description: $\Sigma=\{0,1\}$, a map $f:\Sigma^* \to\Sigma^*$. To describe a string of 0 and 1 $\sigma$, the description is a map so that $f(\tau)=\sigma$. Kolmogorov complexity $C_f$ $$ C_f(x) = \begin{cases} min\{ \vert p \vert : f(p) = x & \text{if x} \\ \infty & \text{otherwise} \} \end{cases} $$ $f$ can be a universal turing machine.

Fisher Information Approximation

Published:
Tags:
Summary: #FIA is a method to describe the [[minimum-description-length|minimum description length ( #MDL )]] of models, $$ \mathrm{FIA} = -\ln p(y | \hat\theta) + \frac{k}{2} \ln \frac{n}{2\pi} + \ln \int_\Theta \sqrt{ \operatorname{det}[I(\theta)] d\theta } $$ $I(\theta)$: Fisher information matrix of sample size 1. $$I_{i,j}(\theta) = E\left( \frac{\partial \ln p(y| \theta)}{\partial \theta_i}\frac{ \partial \ln p (y | \theta) }{ \partial \theta_j } \right)$$.

Bayesian Information Criterion

Published:
Summary: BIC is Bayesian information criterion, it replaced the $+2k$ term in AIC with $k\ln n$ $$ \mathrm{BIC} = -2\ln p(y|\hat\theta) + k\ln n = \ln \left(\frac{n^k}{p^2}\right) $$ $n$ is the observations. We prefer the model with a small BIC.

Bayes Factors

Published:
Tags:
Summary: $$ \frac{p(\mathscr M_1|y)}{ p(\mathscr M_2|y) } = \frac{p(\mathscr M_1)}{ p(\mathscr M_2) }\frac{p(y|\mathscr M_1)}{ p(y|\mathscr M_2) } $$ Bayes factor $$ \mathrm{BF_{12}} = \frac{m(y|\mathscr M_1)}{m(y|\mathscr M_2)} $$ $\mathrm{BF_{12}}$: how many time more likely is model $\mathscr M_1$ than $\mathscr M_2$.

Akaike Information Criterion

Published:
References: - Akaike Information Criterion @ Wikipedia - Vandekerckhove, J., & Matzke, D. (2015). Model comparison and the principle of parsimony. Oxford Library of Psychology.
Summary: Suppose we have a model that describes the data generation process behind a dataset. The distribution by the model is denoted as $\hat f$. The actual data generation process is described by a distribution $f$. We ask the question: How good is the approximation using $\hat f$? To be more precise, how much information is lost if we use our model dist $\hat f$ to substitute the actual data generation distribution $f$?