Statistics of Graphs

#Graph #Basics #Statistics

Local Statistics

Node Degree

Node degree of a node $u$ $$ d_u = \sum_{v\in \mathcal V} A[u,v], $$ where $A$ is the adjacency matrix.

Node Centrality

Importance of a node on a graph:

Eigenvector Centrality of a Graph
Given a graph with adjacency matrix $\mathbf A$, the eigenvector centrality is $$ \mathbf e_u = \frac{1}{\lambda} \sum_{v\in\mathcal V} \mathbf A[u,v] \mathbf e_v, \qquad \forall u \in \mathcal V. $$ Why is it called Eigenvector Centrality The definition is equivalent to $$ \lambda \mathbf e = \mathbf A\mathbf e. $$ Power Iteration The solution to $\mathbf e$ is the eigenvector that corresponds to the largest eigenvalue $\lambda_1$. Power iteration method can help us get this eigenvector, i.e., …
Betweenness Centrality of a Graph
Betweenness centrality of a node $v$ is measurement of how likely the shortest path between two nodes $u_s$ and $u_t$ is gonna pass through node $v$, $$ c(v) = \sum_{v\neq u_s\neq u_t} \frac{\sigma_{u_su_t}(v) }{\sigma_{u_su_t}}, $$ where $\sigma_{u_su_t}(v)$ is the number of shortest path between $u_s$ and $u_t$, and passing through $u$, while $\sigma_{u_su_t}$ is the number of shortest path between $u_s$ and $u_t$. A figure from wikipedia demonstrates this idea well. The nodes on the outreach …
Closeness centrality

Clustering Coefficients

Proportion of motifs, e.g., closed triangles, in a node’s neighborhood.

Graph Clustering Coefficient

Local Clustering Coefficient $$ c_u = \frac{ \lvert (v_1,v_2)\in \mathcal E: v_1, v_2 \in \mathcal N(u) \rvert}{ \color{red}{d_n \choose 2} }, $$ where $\color{red}{d_n \choose 2}$ means all the possible combinations of neighbor nodes, and $\mathcal N(u)$ is the set of nodes that are neighbor to $u$. Closed Triangles Ego Graph Counting the closed triangles of the ego graph of a node and normalize it by the total possible number of triangles is also a measure of clustering coefficients. If the …

Graph Level Statistics

Bag of Nodes

Using the statistics of the local statistics, e.g., distribution of node degrees, as graph level statistics.

Weisfeiler-Lehmen Kernel

Weisfeiler-Lehman Kernel

The Weisfeiler-Lehman kernel is an iterative integration of neighborhood information.

We initialize the labels for each node using its own node degree. At each step, we take the neighboring node degrees to form a . At step $K$, we have the multisets for each node. Those multisets at each node can be processed to form an representation of the graph which is in turn used to calculate statistics of the graph.

This iteration can be used to test if two graphs are isomorphism.

Neighborhood Overlap

We can define many different similarity measures $\mathbf S$.

Planted: 2021-09-25 by L Ma;

References:

Hamilton2020 Hamilton WL. Graph Representation Learning. Morgan & Claypool Publishers; 2020. pp. 1–159. doi:10.2200/S01045ED1V01Y202009AIM046