Statistics of Graphs
#Graph #Basics #Statistics
Local Statistics
Node Degree
Node Degree
Node Centrality
Importance of a node on a graph:

Eigenvector Centrality of a Graph
Given a graph with adjacency matrix $\mathbf A$, the eigenvector centrality is $$ \mathbf e_u = \frac{1}{\lambda} \sum_{v\in\mathcal V} \mathbf A[u,v] \mathbf e_v, \qquad \forall u \in \mathcal V. $$ Why is it called Eigenvector Centrality The definition is equivalent to $$ \lambda \mathbf e = \mathbf A\mathbf e. $$ Power Iteration The solution to $\mathbf e$ is the eigenvector that corresponds to the largest eigenvalue $\lambda_1$. Power iteration method can help us get this eigenvector, i.e., the $^{(t+1)}$ iteration is related to the previous iteration $^{(t)}$, through the following relation, $$ \mathbf e^{(t+1)} = \mathbf A \mathbf e^{(t)}. 
Betweenness Centrality of a Graph
Betweenness centrality of a node $v$ is measurement of how likely the shortest path between two nodes $u_s$ and $u_t$ is gonna pass through node $v$, $$ c(v) = \sum_{v\neq u_s\neq u_t} \frac{\sigma_{u_su_t}(v) }{\sigma_{u_su_t}}, $$ where $\sigma_{u_su_t}(v)$ is the number of shortest path between $u_s$ and $u_t$, and passing through $u$, while $\sigma_{u_su_t}$ is the number of shortest path between $u_s$ and $u_t$. A figure from wikipedia demonstrates this idea well. The nodes on the outreach have smaller betweenness centrality, while the nodes in the core have higher betweenness centrality. Source: Wikipedia Outreach and Core It is almost like cheating using the work “outreach” and “core” here.  Closeness centrality
Clustering Coefficients
Proportion of motifs, e.g., closed triangles, in a node’s neighborhood.
Local Variant of Clustering Coefficient
Graph Level Statistics
Bag of Nodes
Using the statistics of the local statistics, e.g., distribution of node degrees, as graph level statistics.
WeisfeilerLehmen Kernel
WeisfeilerLehman Kernel
Neighborhood Overlap
We can define many different similarity measures $\mathbf S$.

Number of shared neighbor nodes: $\mathbf S[u, v] = \lvert \mathcal N(u) \cap \mathcal N(v) \rvert$. The likelihood of an edge between $u$ and $v$, $P(A[u,v]=1) \propto \mathcal S[u,v]$.

Local Overlap Measures

Graph Local Overlap Measure: Sorensen Index
The Sorensen index is $$ \mathbf S_{\text{Sorensen}}[u,v] = \frac{ 2\lvert \mathcal N (u) \cap \mathcal N(v) \rvert }{ d_u + d_v}, $$ where $d_u$ is the node degree of node $u$ and $\mathcal N(u)$ is the neighbor nodes of $u$. 
Graph Local Overlap Measure: Salton Index
The Salton index is $$ \mathbf S_{\text{Salton}}[u,v] = \frac{ 2\lvert \mathcal N (u) \cap \mathcal N(v) \rvert }{ \sqrt{d_u d_v}}, $$ where $d_u$ is the node degree of node $u$ and $\mathcal N(u)$ is the neighbor nodes of $u$. 
Jaccard Similarity
Jaccard index is the ratio of the size of the intersect of the set and the size of the union of the set. $$ J(A, B) = \frac{ \vert A \cap B \vert }{ \vert A \cup B \vert } $$ Jaccard distance $d_J(A,B)$ is defined as $$ d_J(A,B) = 1  J(A,B). $$ Properties If the two sets are the same, $A=B$, we have $J(A,B)=1$ or $d_J(A,B)=0$. We have maximum similarity. If the two sets have nothing in common, we have $J(A,B)=0$ or $d_J(A,B)=1$. We have minimum similarity. Examples Sentence One Word Set: (( sentenceOneWords )) Sentence Two Word Set: (( sentenceTwoWords )) Intersect: (( intersectWords )) Union: (( unionWords )) Jaccard Index: (( jaccardIndex )) Jaccard Distance: (( jaccardDistance )) Vue. 
Graph Local Overlap Measure: Resource Allocation Index
The Resource Allocation (RA) index is $$ \mathbf S_{\text{RA}}[v_1,v_2] = \sum_{u\in\mathcal N(u) \cap \mathcal N(v)} \frac{1}{d_u}, $$ where $d_u$ is the node degree of node $u$ and $\mathcal N(u)$ is the neighbor nodes of $u$. 
Graph Local Overlap Measure: Adamic Adar Index
The Adamic Adar (AA) index is1 $$ \mathbf S_{\text{AA}}[v_1,v_2] = \sum_{u\in\mathcal N(u) \cap \mathcal N(v)} \frac{1}{\log d_u}, $$ where $d_u$ is the node degree of node $u$ and $\mathcal N(u)$ is the neighbor nodes of $u$. If two nodes have shared neighbor, the degree of the neighbors will be at least 2. So it is safe to use $1/\log d_u$. Adamic2003 Adamic LA, Adar E. Friends and neighbors on the Web. Soc Networks. 2003;25: 211–230. doi:10.1016/S03788733(03)000091 ↩︎


Global Overlap Measures

Graph Global Overlap Measure: Katz Index
The Katz index is $$ \mathbf S_{\text{Katz}}[u,v] = \sum_{i=1}^\infty \beta^i \mathbf A^i[u, v], $$ where $\mathbf A^i[u, v]$ is the matrix $\mathbf A$ to the $i$th power. Some for $\beta^i$. The Katz index describes the similarity between of node $u$ and node $v$. Do not confuse power with contravariant indices For readers familiar with tensor notations, it might be confusing. We some times use contravariant indices on the top right of the tensor notation. But here ${}^{i}$ means to the $i$th power. The index is proved to be the following $$ \mathbf S_{\text{Katz}} = (\mathbf I  \beta \mathbf A)^{1}  \mathbf I. 
Graph Global Overlap Measure: LeichtHolmeNewman Index
From Katz Index to LHN Index Katz Index Graph Global Overlap Measure: Katz Index The Katz index is $$ \mathbf S_{\text{Katz}}[u,v] = \sum_{i=1}^\infty \beta^i \mathbf A^i[u, v], $$ where $\mathbf A^i[u, v]$ is the matrix $\mathbf A$ to the $i$th power. Some for $\beta^i$. The Katz index describes the similarity between of node $u$ and node $v$. Do not confuse power with contravariant indices For readers familiar with tensor notations, it might be confusing. We some times use contravariant indices on the top right of the tensor notation. But here ${}^{i}$ means to the $i$th … has a knob to tune the punishment towards longer paths. 
Graph Global Overlap Measure: Random Walk Similarity
Random Walk Construct a stochastic transfer matrix $P$ by normalizing the adjacency matrix $\mathbf A$ using the node degrees of the target nodes, $$ \mathbf P = \mathbf A \mathbf D^{1}, $$ where $\mathbf A$ is the adjacency matrix Graph Adjacency Matrix A graph $\mathcal G$ can be represented with an adjacency matrix $\mathbf A$. There are some nice and clear examples on wikipedia1, for example, $$ \begin{pmatrix} 2 & 1 & 0 & 0 & 1 & 0\\ 1 & 0 & 1 & 0 & 1 & 0\\ 0 & 1 & 0 & 1 & 0 & 0\\ 0 & 0 & 1 & 0 & 1 & 1\\ 1 & 1 & 0 & 1 & 0 & 0\\ 0 & 0 & 0 & 1 & 0 & 0 \end{pmatrix} $$ for the graph Public Domain, Link and $\mathbf D$ is a diagonalized matrix with the diagonal elements being the degrees.

L Ma (2021). 'Statistics of Graphs', Datumorphism, 09 April. Available at: https://datumorphism.leima.is/wiki/graph/basics/statisticsofgraphs/.
Table of Contents
Current Ref:

wiki/graph/basics/statisticsofgraphs.md
Links to:
What is Graph
Graph A graph $\mathcal G$ has nodes $\mathcal V$ and edges $\mathcal E$, $$ \mathcal G = ( \mathcal …
Node Degree
Node degree of a node $u$ $$ d_u = \sum_{v\in \mathcal V} A[u,v], $$ where $A$ is the adjacency …
Eigenvector Centrality of a Graph
Given a graph with adjacency matrix $\mathbf A$, the eigenvector centrality is $$ \mathbf e_u = …
Betweenness Centrality of a Graph
Betweenness centrality of a node $v$ is measurement of how likely the shortest path between two …
Local Variant of Clustering Coefficient
$$ c_u = \frac{ \lvert (v_1,v_2)\in \mathcal E: v_1, v_2 \in \mathcal N(u) \rvert}{ \color{red}{d_n …
Graph Local Overlap Measure: Sorensen Index
The Sorensen index is $$ \mathbf S_{\text{Sorensen}}[u,v] = \frac{ 2\lvert \mathcal N (u) \cap …
Graph Local Overlap Measure: Salton Index
The Salton index is $$ \mathbf S_{\text{Salton}}[u,v] = \frac{ 2\lvert \mathcal N (u) \cap \mathcal …
Jaccard Similarity
Jaccard index is the ratio of the size of the intersect of the set and the size of the union of the …
Graph Local Overlap Measure: Resource Allocation Index
The Resource Allocation (RA) index is $$ \mathbf S_{\text{RA}}[v_1,v_2] = \sum_{u\in\mathcal N(u) …
Graph Local Overlap Measure: Adamic Adar Index
The Adamic Adar (AA) index is1 $$ \mathbf S_{\text{AA}}[v_1,v_2] = \sum_{u\in\mathcal N(u) \cap …
Graph Global Overlap Measure: Katz Index
The Katz index is $$ \mathbf S_{\text{Katz}}[u,v] = \sum_{i=1}^\infty \beta^i \mathbf A^i[u, v], $$ …
Graph Global Overlap Measure: LeichtHolmeNewman Index
From Katz Index to LHN Index Katz Index Graph Global Overlap Measure: Katz Index The Katz index is …
Graph Global Overlap Measure: Random Walk Similarity
Random Walk Construct a stochastic transfer matrix $P$ by normalizing the adjacency matrix $\mathbf …