Poisson Process
Published:
Category: { Statistics }
Tags:
References:
- axelpale/poisson-process
Summary:
Pages: 31
Bayes' Theorem
Published:
Category: { Math }
Tags:
References:
- Tree Diagram of Bayes Theorem @ Wikipedia
Summary: Bayes’ Theorem is stated as
$$ P(A\mid B) = \frac{P(B \mid A) P(A)}{P(B)} $$
$P(A\mid B)$: likelihood of A given B $P(A)$: marginal probability of A There is a nice tree diagram for the Bayes’ theorem on Wikipedia.
Tree diagram of Bayes’ theorem
Pages: 31
Kendall Tau Correlation
Published:
Category: { Statistics }
Tags:
References:
- Kendall, M. G. (1938). A new measure of correlation. Biometrika, 30(1–2), 81–93.
- Kendall rank correlation coefficient
- Rank correlation
Summary: Definition two series of data: $X$ and $Y$ cooccurance of them: $(x_i, x_j)$, and we assume that $i<j$ concordant: $x_i < x_j$ and $y_i < y_j$; $x_i > x_j$ and $y_i > y_j$; denoted as $C$ discordant: $x_i < x_j$ and $y_i > y_j$; $x_i > x_j$ and $y_i < y_j$; denoted as $D$ neither concordant nor discordant: whenever equal sign happens Kendall’s tau is defined as
$$ \begin{equation} \tau = \frac{C- D}{\text{all possible pairs of comparison}} = \frac{C- D}{n^2/2 - n/2} \end{equation} $$
Pages: 31
Jackknife Resampling
Published:
Category: { Statistics }
Tags:
References:
- Jackknife Resampling
Summary: Jackknife resampling method
Pages: 31
Covariance Matrix
Published:
Category: { Math }
Tags:
References:
- Covariance Matrix @ Wikipedia
Summary: Also known as the second central moment is a measurement of the spread.
Pages: 31
Gamma Distribution
Published:
Category: { Statistics }
Tags:
Summary: Gamma Distribution PDF:
$$ \frac{\beta^\alpha x^{\alpha-1} e^{-\beta x}}{\Gamma(\alpha)} $$
Visualize
Pages: 31
Cauchy-Lorentz Distribution
Published:
Category: { Statistics }
Tags:
Summary: Cauchy-Lorentz Distribution .. ratio of two independent normally distributed random variables with mean zero.
Source: https://en.wikipedia.org/wiki/Cauchy_distribution
Lorentz distribution is frequently used in physics.
PDF:
$$ \frac{1}{\pi\gamma} \left( \frac{\gamma^2}{ (x-x_0)^2 + \gamma^2} \right) $$
The median and mode of the Cauchy-Lorentz distribution is always $x_0$. $\gamma$ is the FWHM.
Visualize
Pages: 31
Categorical Distribution
Published:
Category: { Statistics }
Tags:
References:
- Categorical Distribution @ Wikipedia
Summary: By generalizing the Bernoulli distribution to $k$ states, we get a categorical distribution. The sample space is $\{s_1, s_2, \cdots, s_k\}$. The corresponding probabilities for each state are $\{p_1, p_2, \cdots, p_k\}$ with the constraint $\sum_{i=1}^k p_i = 1$.
Pages: 31
Binomial Distribution
Published:
Category: { Statistics }
Tags:
References:
- Binomial Distribution @ Wikipedia
Summary: The number of successes in $n$ independent events where each trial has a success rate of $p$.
PMF:
$$ C_n^k p^k (1-p)^{n-k} $$
Pages: 31
Beta Distribution
Published:
Category: { Statistics }
Tags:
Summary: Beta Distribution Interact Alpha Beta mode ((beta_mode)) median ((beta_median)) mean ((beta_mean)) ((makeGraph))
Pages: 31
Bernoulli Distribution
Published:
Category: { Statistics }
Tags:
References:
- Bernoulli Distribution @ Wikipedia
Summary: Two categories with probability $p$ and $1-p$ respectively.
For each experiment, the sample space is $\{A, B\}$. The probability for state $A$ is given by $p$ and the probability for state $B$ is given by $1-p$. The Bernoulli distribution describes the probability of $K$ results with state $s$ being $s=A$ and $N-K$ results with state $s$ being $B$ after $N$ experiments,
$$ P\left(\sum_i^N s_i = K \right) = C _ N^K p^K (1 - p)^{N-K}. $$
Pages: 31
Arcsine Distribution
Published:
Category: { Statistics }
Tags:
Summary: Arcsine Distribution The PDF is
$$ \frac{1}{\pi\sqrt{x(1-x)}} $$
for $x\in [0,1]$.
It can also be generalized to
$$ \frac{1}{\pi\sqrt{(x-1)(b-x)}} $$
for $x\in [a,b]$.
Visualize
Pages: 31
Multiple Comparison Problem
Published:
Category: { Statistics }
Tags:
References:
- Contributors to Wikimedia projects. Multiple comparisons problem. In: Wikipedia [Internet]. 11 Apr 2022 [cited 18 Apr 2022]. Available: https://en.wikipedia.org/wiki/Multiple_comparisons_problem
- Zeni G, Fontana M, Vantini S. Conformal Prediction: a Unified Review of Theory and New Challenges. arXiv [cs.LG]. 2020. Available: http://arxiv.org/abs/2005.07972
Summary: In a multiple comparisons problem, we deal with multiple statistical tests simultaneously.
Examples We see such problems a lot in IT companies. Suppose we have a website and would like to test if a new design of a button can lead to some changes in five different KPIs (e.g., view-to-click rate, click-to-book rate, …).
In multi-horizon time series forecasting, we sometimes choose to forecast multiple future data points in one shot. To properly find the confidence intervals of our predictions, one approach is the so called conformal prediction method. This becomes a multiple comparisons problem because we have to tell if we can reject at least one true null hypothesis.
Pages: 31
Bonferroni Correction
Published:
Category: { Statistics }
Tags:
References:
- Contributors to Wikimedia projects. Bonferroni correction. In: Wikipedia [Internet]. 22 Feb 2022 [cited 18 Apr 2022]. Available: https://en.wikipedia.org/wiki/Bonferroni_correction
- Contributors to Wikimedia projects. Multiple comparisons problem. In: Wikipedia [Internet]. 11 Apr 2022 [cited 18 Apr 2022]. Available: https://en.wikipedia.org/wiki/Multiple_comparisons_problem
Summary: Bonferroni correction is very useful in a multiple comparison problem
Pages: 31
Conditional Probability Table
Published:
Category: { Math }
Tags:
References:
- Tree Diagram of Bayes' Theorem @ Wikipedia
Summary: The conditional probability table is also called CPT
Pages: 31
Normalized Maximum Likelihood
Published:
Tags:
Summary: $$ \mathrm{NML} = \frac{ p(y| \hat \theta(y)) }{ \int_X p( x| \hat \theta (x) ) dx } $$
Pages: 31
Minimum Description Length
Published:
Category: { Statistics }
Tags:
References:
- Vandekerckhove, J., & Matzke, D. (2015). Model comparison and the principle of parsimony. Oxford Library of Psychology.
- Grünwald, P. D. (2007). The Minimum Description Length Principle. MIT Press.
Summary: MDL is a measure of how well a model compresses data by minimizing the combined cost of the description of the model and the misfit.
Pages: 31
Kolmogorov Complexity
Published:
Tags:
References:
- Fortnow, L. (2000). Kolmogorov complexity. (January), 1–14.
- Grünwald, P. D. (2007). The Minimum Description Length Principle. MIT Press.
Summary: Description of Data
The measurement of complexity is based on the observation that the compressibility of data doesn’t depend on the “language” used to describe the compression process that much. This makes it possible for us to find a universal language, such as a universal computer language, to quantify the compressibility of the data.
One intuitive idea is to use a programming language to describe the data. If we have a sequence of data,
0,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,…,9999
It takes a lot of space if we show the complete sequence. However, our math intuition tells us that this is nothing but a list of consecutive numbers from 0 to 9999.
Pages: 31
Fisher Information Approximation
Published:
Tags:
Summary: FIA is a method to describe the minimum description length ( [[MDL]] Minimum Description Length MDL is a measure of how well a model compresses data by minimizing the combined cost of the description of the model and the misfit. ) of models,
$$ \mathrm{FIA} = -\ln p(y | \hat\theta) + \frac{k}{2} \ln \frac{n}{2\pi} + \ln \int_\Theta \sqrt{ \operatorname{det}[I(\theta)] d\theta } $$
$I(\theta)$: Fisher information matrix of sample size 1. $$I_{i,j}(\theta) = E\left( \frac{\partial \ln p(y| \theta)}{\partial \theta_i}\frac{ \partial \ln p (y | \theta) }{ \partial \theta_j } \right)$$.
Pages: 31