Statistics of Graphs
Local Statistics
Node Degree
Node Centrality
Importance of a node on a graph:
- Eigenvector Centrality of a GraphGiven a graph with adjacency matrix $\mathbf A$, the eigenvector centrality is $$ \mathbf e_u = \frac{1}{\lambda} \sum_{v\in\mathcal V} \mathbf A[u,v] \mathbf e_v, \qquad \forall u \in \mathcal V. $$ Why is it called Eigenvector Centrality The definition is equivalent to $$ \lambda \mathbf e = \mathbf A\mathbf e. $$ Power Iteration The solution to $\mathbf e$ is the eigenvector that corresponds to the largest eigenvalue $\lambda_1$. Power iteration method can help us get this eigenvector, i.e., …
- Betweenness Centrality of a GraphBetweenness centrality of a node $v$ is measurement of how likely the shortest path between two nodes $u_s$ and $u_t$ is gonna pass through node $v$, $$ c(v) = \sum_{v\neq u_s\neq u_t} \frac{\sigma_{u_su_t}(v) }{\sigma_{u_su_t}}, $$ where $\sigma_{u_su_t}(v)$ is the number of shortest path between $u_s$ and $u_t$, and passing through $u$, while $\sigma_{u_su_t}$ is the number of shortest path between $u_s$ and $u_t$. A figure from wikipedia demonstrates this idea well. The nodes on the outreach …
- Closeness centrality
Clustering Coefficients
Proportion of motifs, e.g., closed triangles, in a node’s neighborhood.
Graph Level Statistics
Bag of Nodes
Using the statistics of the local statistics, e.g., distribution of node degrees, as graph level statistics.
Weisfeiler-Lehmen Kernel
The Weisfeiler-Lehman kernel is an iterative integration of neighborhood information.
We initialize the labels for each node using its own node degree. At each step, we take the neighboring node degrees to form a . At step $K$, we have the multisets for each node. Those multisets at each node can be processed to form an representation of the graph which is in turn used to calculate statistics of the graph.
- This iteration can be used to test if two graphs are isomorphism.
Neighborhood Overlap
We can define many different similarity measures $\mathbf S$.
Number of shared neighbor nodes: $\mathbf S[u, v] = \lvert \mathcal N(u) \cap \mathcal N(v) \rvert$. The likelihood of an edge between $u$ and $v$, $P(A[u,v]=1) \propto \mathcal S[u,v]$.
Local Overlap Measures
- Graph Local Overlap Measure: Sorensen IndexThe Sorensen index is $$ \mathbf S_{\text{Sorensen}}[u,v] = \frac{ 2\lvert \mathcal N (u) \cap \mathcal N(v) \rvert }{ d_u + d_v}, $$ where $d_u$ is the node degree of node $u$ and $\mathcal N(u)$ is the neighbor nodes of $u$.
- Graph Local Overlap Measure: Salton IndexThe Salton index is $$ \mathbf S_{\text{Salton}}[u,v] = \frac{ 2\lvert \mathcal N (u) \cap \mathcal N(v) \rvert }{ \sqrt{d_u d_v}}, $$ where $d_u$ is the node degree of node $u$ and $\mathcal N(u)$ is the neighbor nodes of $u$.
- Jaccard SimilarityJaccard index is the ratio of the size of the intersect of the set and the size of the union of the set. $$ J(A, B) = \frac{ \vert A \cap B \vert }{ \vert A \cup B \vert } $$ Jaccard distance $d_J(A,B)$ is defined as $$ d_J(A,B) = 1 - J(A,B). $$ Properties If the two sets are the same, $A=B$, we have $J(A,B)=1$ or $d_J(A,B)=0$. We have maximum similarity. If the two sets have nothing in common, we have $J(A,B)=0$ or $d_J(A,B)=1$. We have minimum similarity. Examples Sentence One Word Set: (( …
- Graph Local Overlap Measure: Resource Allocation IndexThe Resource Allocation (RA) index is $$ \mathbf S_{\text{RA}}[v_1,v_2] = \sum_{u\in\mathcal N(u) \cap \mathcal N(v)} \frac{1}{d_u}, $$ where $d_u$ is the node degree of node $u$ and $\mathcal N(u)$ is the neighbor nodes of $u$.
- Graph Local Overlap Measure: Adamic Adar IndexThe Adamic Adar (AA) index is1 $$ \mathbf S_{\text{AA}}[v_1,v_2] = \sum_{u\in\mathcal N(u) \cap \mathcal N(v)} \frac{1}{\log d_u}, $$ where $d_u$ is the node degree of node $u$ and $\mathcal N(u)$ is the neighbor nodes of $u$. If two nodes have shared neighbor, the degree of the neighbors will be at least 2. So it is safe to use $1/\log d_u$. Adamic2003 Adamic LA, Adar E. Friends and neighbors on the Web. Soc Networks. 2003;25: 211–230. doi:10.1016/S0378-8733(03)00009-1 ↩︎
Global Overlap Measures
- Graph Global Overlap Measure: Katz Index
The Katz index is
\mathbf S_{\text{Katz}} = \sum_{i=1}^\infty \beta^i \mathbf A^i,
where $\mathbf A^i$ is the matrix $\mathbf A$ to the $i$th power. Some for $\beta^i$. The Katz index describes the similarity between of node $u$ and node $v$.
For readers familiar with tensor notations, it might be confusing. We some times use contravariant indices on the top right of the tensor notation.
But here ${}^{i}$ means to the $i$th power.
The index is proved to be the following
\mathbf S_{\text{Katz}} = …
- Graph Global Overlap Measure: Leicht-Holme-Newman Index
The LHN index is a normalized similarity index.
From Katz Index to LHN Index
has a knob to tune the punishment towards longer paths. The Leicht-Holme-Newman Similarity, aka, LHN Similarity, has a better normalization factor.
The LHN similarity of a graph $\mathcal G$ normalizes the path calculation by the expected number of path in a random graph $\tilde{\mathcal G}$.
We require $\tilde{\mathcal G}$ to have the same set of node degrees as $\mathcal G$. Such a graph is a configuration model.
The …
- Graph Global Overlap Measure: Random Walk Similarity
Random Walk
Construct a stochastic transfer matrix $P$ by normalizing the adjacency matrix $\mathbf A$ using the node degrees of the target nodes,
\mathbf P = \mathbf A \mathbf D^{-1},
where $\mathbf A$ is the and $\mathbf D$ is a diagonalized matrix with the diagonal elements being the degrees.
Given a random walk associated with $\mathbf P$ that starts from node $u$, we find the probability that visits each node $v$, $\mathbf q_u$,
\mathbf q_u = c \mathbf P \mathbf q_u + (1-c) \mathbf e_u, …
wiki/graph/basics/statistics-of-graphs
Links to:L Ma (2021). 'Statistics of Graphs', Datumorphism, 09 April. Available at: https://datumorphism.leima.is/wiki/graph/basics/statistics-of-graphs/.