Jaccard Similarity

Jaccard index is the ratio of the size of the intersect of the set and the size of the union of the set.

$$ J(A, B) = \frac{ \vert A \cap B \vert }{ \vert A \cup B \vert } $$

Jaccard distance $d_J(A,B)$ is defined as

$$ d_J(A,B) = 1 - J(A,B). $$

Properties

If the two sets are the same, $A=B$, we have $J(A,B)=1$ or $d_J(A,B)=0$. We have maximum similarity.

If the two sets have nothing in common, we have $J(A,B)=0$ or $d_J(A,B)=1$. We have minimum similarity.

Examples

Word Set: (( sentenceOneWords ))
Word Set: (( sentenceTwoWords ))
Intersect: (( intersectWords ))
Union: (( unionWords ))
Jaccard Index: (( jaccardIndex ))
Jaccard Distance: (( jaccardDistance ))

Planted: by ;

References:

L Ma (2019). 'Jaccard Similarity', Datumorphism, 05 April. Available at: https://datumorphism.leima.is/cards/math/jaccard-similarity/.