Math

Summary: Jaccard index is the ratio of the size of the intersect of the set and the size of the union of the set. $$ J(A, B) = \frac{ \vert A \cap B \vert }{ \vert A \cup B \vert } $$ Jaccard distance $d_J(A,B)$ is defined as $$ d_J(A,B) = 1 - J(A,B). $$ Properties If the two sets are the same, $A=B$, we have $J(A,B)=1$ or $d_J(A,B)=0$. We have maximum similarity. If the two sets have nothing in common, we have $J(A,B)=0$ or $d_J(A,B)=1$. We have minimum similarity. Examples Sentence One Word Set: (( sentenceOneWords )) Sentence Two Word Set: (( sentenceTwoWords )) Intersect: (( intersectWords )) Union: (( unionWords )) Jaccard Index: (( jaccardIndex )) Jaccard Distance: (( jaccardDistance ))

Pages: 24

Eigenvalues and Eigenvectors

Published: 2019-05-06

Category: { Math }

Tags:

#Linear Algebra #Basics

References: - Eigenvectors and Eigenvalues @ Explained Visually

Summary: To find the eigenvectors $\mathbf x$ of a matrix $\mathbf A$, we construct the eigen equation $$ \mathbf A \mathbf x = \lambda \mathbf x, $$ where $\lambda$ is the eigenvalue. We rewrite it in the components form, $$ \begin{equation} A_{ij} x_j = \lambda x_i. \label{eqn-eigen-decomp-def} \end{equation} $$ Mathematically speaking, it is straightforward to find the eigenvectors and eigenvalues. Eigenvectors are Special Directions Judging from the definition in Eq.($\ref{eqn-eigen-decomp-def}$), the eigenvectors do not change direction under the operation of the matrix $\mathbf A$. Reconstruct $\mathbf A$ We can reconstruct $\mathbf A$ using the eigenvalues and eigenvectors. First of all, we will construct a matrix of eigenvectors,

Pages: 24

Cosine Similarity

Published: 2019-05-06

Category: { Math }

Tags:

#Set #Distance

References: - Cosine Similarity

Summary: As simple as the inner product of two vectors $$ d_{cos} = \frac{\vec A}{\vert \vec A \vert} \cdot \frac{\vec B }{ \vert \vec B \vert} $$ Examples To use cosine similarity, we have to vectorize the words first. There are many different methods to achieve this. For the purpose of illustrating cosine similarity, we use term frequency. Term frequency is the occurrence of the words. We do not deal with duplications so duplicate words will have some effect on the similarity. In principle, we could also use word set for a sentence to remove the effect of duplicate words. In most cases, if a word is repeating, it would indeed make the sentences different.

Pages: 24

n-gram

Published: 2019-05-19

Category: { Math }

Tags:

#NLP

References: - words/n-gram

Summary: n-gram is a method to split words into set of substring elements so that those can be used to match words. Examples Use the following examples to get your first idea about it. I created two columns so that we could compare the n-grams of two different words side-by-side. n in n-gram is Word One Clean Word: (( sentenceOneWords )) n-grams: (( sentenceOneWordsnGram )) Word Two Clean Word: (( sentenceTwoWords )) n-grams: (( sentenceTwoWordsnGram ))

Pages: 24

Levenshtein Distance

Published: 2019-05-19

Category: { Math }

Tags:

#Distance #NLP

References: - levenshtein-distance @ trekhleb/javascript-algorithms

Summary: Levenshtein distance calculates the number of operations needed to change one word to another by applying single-character edits (insertions, deletions or substitutions). The reference explains this concept very well. For consistency, I extracted a paragraph from it which explains the operations in Levenshtein algorithm. The source of the following paragraph is the first reference of this article. Levenshtein Matrix Cell (0:1) contains red number 1. It means that we need 1 operation to transform M to an empty string. And it is by deleting M. This is why this number is red. Cell (0:2) contains red number 2. It means that we need 2 operations to transform ME to an empty string.

Pages: 24

Frobenius distance

Published: 2019-06-17

Category: { Math }

Tags:

#Distance

References: - Weisstein. Frobenius Norm. [cited 8 Nov 2021]. Available: https://mathworld.wolfram.com/FrobeniusNorm.html

Summary: Frobenius distance between the matrix $X_{n}^{\phantom{n}k}$ and $H_n^{\phantom{n}r} W_r^{\phantom{r}k}$, $$ \lVert X_{n}^{\phantom{n}k} - H_n^{\phantom{n}r} W_r^{\phantom{r}k} \rVert^2 \equiv \sum_{n,k} (X_{n}^{\phantom{n}k} - H_n^{\phantom{n}r} W_r^{\phantom{r}k})^2. $$

Pages: 24

Tucker Decomposition

Published: 2019-06-18

Category: { Math }

Tags:

#Tensor #Factorization #Linear Algebra

References: - Matrix and Tensor Factorization from a Machine Learning Perspective

Summary: Tucker decomposition of a generalization of SVD to higher ranks

Pages: 24

SVD: Singular Value Decomposition

Published: 2019-06-18

Category: { Math }

Tags:

#Matrix #Factorization #Linear Algebra

References: - Matrix and Tensor Factorization from a Machine Learning Perspective

Summary: Given a matrix $\mathbf X \to X_{m}^{\phantom{m}n}$, we can decompose it into three matrices $$ X_{m}^{\phantom{m}n} = U_{m}^{\phantom{m}k} D_{k}^{\phantom{k}l} (V_{n}^{\phantom{n}l} )^{\mathrm T}, $$ where $D_{k}^{\phantom{k}l}$ is diagonal. Here we have $\mathbf U$ being constructed by the eigenvectors of $\mathbf X \mathbf X^{\mathrm T}$, while $\mathbf V$ is being constructed by the eigenvectors of $\mathbf X^{\mathrm T} \mathbf X$ (which is also the reason we keep the transpose). I find this slide from Christoph Freudenthaler very useful. The original slide has been added as a reference to this article. SVD visualized by Christoph Freudenthaler

Pages: 24

Modes and Slices of Tensors

Published: 2019-06-18

Category: { Math }

Tags:

#Tensor

References: - Tensor Decompositions and Applications by Tamara G. Kolda and Brett W. Bader

Summary: Simple decomposition of tensors

Pages: 24

Khatri-Rao Product

Published: 2019-06-18

Category: { Math }

Tags:

#Tensor #Linear Algebra

References: - Kronecker product

Summary: $$ \mathbf{A} \ast \mathbf{B} = \left(\mathbf{A}_{ij} \otimes \mathbf{B}_{ij}\right)_{ij} $$

Pages: 24

Cholesky Decomposition

Published: 2019-06-18

Category: { Math }

Tags:

#Linear Algebra

Summary: Decomposing a matrix into two

Pages: 24

Canonical Decomposition

Published: 2019-06-18

Category: { Math }

Tags:

#Tensor #Factorization #Linear Algebra

References: - Matrix and Tensor Factorization from a Machine Learning Perspective

Summary: Canonical decomposition

Pages: 24

Mahalanobis Distance

Published: 2020-03-11

Category: { Math }

Tags:

#Distance #Metric

References: - Mahalanobis distance @ Wikipedia

Summary: Distance between a point and a distribution by measuring the distance between the point and the mean of the distribution using the coordinate system defined by the principal components.

Pages: 24