Data

Transforms that captures the local patterns

Some sugar for data science

Data pipelines fails. Just like rebooting a system, a first fix if we don’t spot any obvious …

Centered Kernel Alignment (CKA) is a similarity metric designed to measure the similarity of between representations of features in neural networks.

Given two kernels of the feature representations $K=k(x,x)$ and $L=l(y,y)$, HSIC is defined as12 $$ …

The Box-Cox transformation transforms data into Gaussian data, which is especially useful in feature engineering, e.g., fixing irregularities in variances of a time series.

In a [[learning problem]] The Learning Problem The learning problem posed by Vapnik:1 Given a …

[[ERM]] ERM: Empirical Risk Minimization In a [[learning problem]] The Learning Problem The learning …

The loss calculated on all the data points

The loss calculated on all the whole population

Data storage is diverse. For data on smaller scales, we are mostly dealing with some data files. …

The Gini impurity is a measurement of the impurity of a set.

The information is a measurement of the entropy of the dataset.

During feature engineering, we have to deal with missing values.