Hilbert-Schmidt Independence Criterion (HSIC)

Given two kernels of the feature representations $K=k(x,x)$ and $L=l(y,y)$, HSIC is defined as¹²

$$ \operatorname{HSIC}(K, L) = \frac{1}{(n-1)^2} \operatorname{tr}( K H L H ), $$

where

$x$, $y$ are the representations of features,
$n$ is the dimension of the representation of the features,
$H$ is the so-called [[centering matrix]] Centering Matrix Useful when centering a vector around its mean .

We can choose different kernel functions $k$ and $l$. For example, if $k$ and $l$ are linear kernels, we have $k(x, y) = l(x, y) = x \cdot y$. In this linear case, HSIC is simply $\parallel\operatorname{cov}(x^T,y^T) \parallel^2_{\text{Frobenius}}$.

Gretton A, Bousquet O, Smola A, Schölkopf B. Measuring Statistical Dependence with Hilbert-Schmidt Norms. Algorithmic Learning Theory. Springer Berlin Heidelberg; 2005. pp. 63–77. doi:10.1007/11564089_7 ↩︎
Kornblith S, Norouzi M, Lee H, Hinton G. Similarity of Neural Network Representations Revisited. arXiv [cs.LG]. 2019. Available: http://arxiv.org/abs/1905.00414 ↩︎

Planted: 2021-11-08 by L Ma;

Dynamic Backlinks to cards/machine-learning/measurement/hilbert-schmidt-independence-criterion:

Centered Kernel Alignment (CKA)

Centered Kernel Alignment (CKA) is a similarity metric designed to measure the similarity of between …

cards/machine-learning/measurement/hilbert-schmidt-independence-criterion Links to:

Centering Matrix

Useful when centering a vector around its mean

L Ma (2021). 'Hilbert-Schmidt Independence Criterion (HSIC)', Datumorphism, 11 April. Available at: https://datumorphism.leima.is/cards/machine-learning/measurement/hilbert-schmidt-independence-criterion/.