Math

Knowledge snippets about math

Introduction: My Knowledge Cards

Combinations

Published:
Category: { Math }
Summary: Choose X from N is $$ C_N^X = \frac{N!}{ X! (N-X)! } $$
Pages: 24

Term Frequency - Inverse Document Frequency

Published:
Category: { Math }
Tags:
References: - Tf-idf - tf-idf
Summary:
Pages: 24

Jaccard Similarity

Published:
Category: { Math }
References: - Jaccard index
Summary: Jaccard index is the ratio of the size of the intersect of the set and the size of the union of the set. $$ J(A, B) = \frac{ \vert A \cap B \vert }{ \vert A \cup B \vert } $$ Jaccard distance $d_J(A,B)$ is defined as $$ d_J(A,B) = 1 - J(A,B). $$ Properties If the two sets are the same, $A=B$, we have $J(A,B)=1$ or $d_J(A,B)=0$. We have maximum similarity. If the two sets have nothing in common, we have $J(A,B)=0$ or $d_J(A,B)=1$. We have minimum similarity. Examples Sentence One Word Set: (( sentenceOneWords )) Sentence Two Word Set: (( sentenceTwoWords )) Intersect: (( intersectWords )) Union: (( unionWords )) Jaccard Index: (( jaccardIndex )) Jaccard Distance: (( jaccardDistance ))
Pages: 24

Eigenvalues and Eigenvectors

Published:
Category: { Math }
Summary: To find the eigenvectors $\mathbf x$ of a matrix $\mathbf A$, we construct the eigen equation $$ \mathbf A \mathbf x = \lambda \mathbf x, $$ where $\lambda$ is the eigenvalue. We rewrite it in the components form, $$ \begin{equation} A_{ij} x_j = \lambda x_i. \label{eqn-eigen-decomp-def} \end{equation} $$ Mathematically speaking, it is straightforward to find the eigenvectors and eigenvalues. Eigenvectors are Special Directions Judging from the definition in Eq.($\ref{eqn-eigen-decomp-def}$), the eigenvectors do not change direction under the operation of the matrix $\mathbf A$. Reconstruct $\mathbf A$ We can reconstruct $\mathbf A$ using the eigenvalues and eigenvectors. First of all, we will construct a matrix of eigenvectors,
Pages: 24

Cosine Similarity

Published:
Category: { Math }
References: - Cosine Similarity
Summary: As simple as the inner product of two vectors $$ d_{cos} = \frac{\vec A}{\vert \vec A \vert} \cdot \frac{\vec B }{ \vert \vec B \vert} $$ Examples To use cosine similarity, we have to vectorize the words first. There are many different methods to achieve this. For the purpose of illustrating cosine similarity, we use term frequency. Term frequency is the occurrence of the words. We do not deal with duplications so duplicate words will have some effect on the similarity. In principle, we could also use word set for a sentence to remove the effect of duplicate words. In most cases, if a word is repeating, it would indeed make the sentences different.
Pages: 24

n-gram

Published:
Category: { Math }
Tags:
References: - words/n-gram
Summary: n-gram is a method to split words into set of substring elements so that those can be used to match words. Examples Use the following examples to get your first idea about it. I created two columns so that we could compare the n-grams of two different words side-by-side. n in n-gram is Word One Clean Word: (( sentenceOneWords )) n-grams: (( sentenceOneWordsnGram )) Word Two Clean Word: (( sentenceTwoWords )) n-grams: (( sentenceTwoWordsnGram ))
Pages: 24

Levenshtein Distance

Published:
Category: { Math }
Summary: Levenshtein distance calculates the number of operations needed to change one word to another by applying single-character edits (insertions, deletions or substitutions). The reference explains this concept very well. For consistency, I extracted a paragraph from it which explains the operations in Levenshtein algorithm. The source of the following paragraph is the first reference of this article. Levenshtein Matrix Cell (0:1) contains red number 1. It means that we need 1 operation to transform M to an empty string. And it is by deleting M. This is why this number is red. Cell (0:2) contains red number 2. It means that we need 2 operations to transform ME to an empty string.
Pages: 24

Frobenius distance

Published:
Category: { Math }
Tags:
Summary: Frobenius distance between the matrix $X_{n}^{\phantom{n}k}$ and $H_n^{\phantom{n}r} W_r^{\phantom{r}k}$, $$ \lVert X_{n}^{\phantom{n}k} - H_n^{\phantom{n}r} W_r^{\phantom{r}k} \rVert^2 \equiv \sum_{n,k} (X_{n}^{\phantom{n}k} - H_n^{\phantom{n}r} W_r^{\phantom{r}k})^2. $$
Pages: 24

Tucker Decomposition

Published:
Category: { Math }
Summary: Tucker decomposition of a generalization of SVD to higher ranks
Pages: 24

SVD: Singular Value Decomposition

Published:
Category: { Math }
Summary: Given a matrix $\mathbf X \to X_{m}^{\phantom{m}n}$, we can decompose it into three matrices $$ X_{m}^{\phantom{m}n} = U_{m}^{\phantom{m}k} D_{k}^{\phantom{k}l} (V_{n}^{\phantom{n}l} )^{\mathrm T}, $$ where $D_{k}^{\phantom{k}l}$ is diagonal. Here we have $\mathbf U$ being constructed by the eigenvectors of $\mathbf X \mathbf X^{\mathrm T}$, while $\mathbf V$ is being constructed by the eigenvectors of $\mathbf X^{\mathrm T} \mathbf X$ (which is also the reason we keep the transpose). I find this slide from Christoph Freudenthaler very useful. The original slide has been added as a reference to this article. SVD visualized by Christoph Freudenthaler
Pages: 24

Modes and Slices of Tensors

Published:
Category: { Math }
Tags:
Summary: Simple decomposition of tensors
Pages: 24

Khatri-Rao Product

Published:
Category: { Math }
References: - Kronecker product
Summary: $$ \mathbf{A} \ast \mathbf{B} = \left(\mathbf{A}_{ij} \otimes \mathbf{B}_{ij}\right)_{ij} $$
Pages: 24

Cholesky Decomposition

Published:
Category: { Math }
Summary: Decomposing a matrix into two
Pages: 24

Canonical Decomposition

Published:
Category: { Math }
Summary: Canonical decomposition
Pages: 24

Mahalanobis Distance

Published:
Category: { Math }
Summary: Distance between a point and a distribution by measuring the distance between the point and the mean of the distribution using the coordinate system defined by the principal components.
Pages: 24

Diagnolize Matrices

Published:
Category: { Math }
Summary: Diagnolizing a matrix is a transformation using its eigen space.
Pages: 24

Multiset, mset or bag

Published:
Category: { Math }
References: - Multiset @ Wikipedia
Summary: A bag is a set in which duplicate elements are allowed. An ordered bag is a list that we use in programming.
Pages: 24

Jensen's Inequality

Published:
Category: { math }
Summary: Jensen’s inequality shows that $$ f(\mathbb E(X)) \leq \mathbb E(f(X)) $$ for a concave function $f(\cdot)$.
Pages: 24

Gaussian Integrals

Published:
Category: { Math }
Tags:
Summary: Gaussian integral is one of the most useful things if one could write it down.
Pages: 24

The Hubbard-Stratonovich Identity

Published:
Category: { Math }
Summary: Very useful in calculating the partition function
Pages: 24