Learning Theories

Introduction: My Knowledge Cards

PAC: Probably Approximately Correct

Published: 2020-01-16

Category: { Machine Learning::Theories }

Tags:

References: - Shalev-Shwartz, S., & Ben-David, S. (2013). Understanding machine learning: From theory to algorithms. Understanding Machine Learning: From Theory to Algorithms.

Summary:

Pages: 8

SRM: Structural Risk Minimization

Published: 2021-02-18

Category: { Machine Learning::Theories }

Tags:

#Data #Loss

References: - Structural risk minimization @ Wikipedia - Murphy, K. P. (2012). Probabilistic Machine Learning: An Introduction.

Summary: [[ERM]] ERM: Empirical Risk Minimization In a [[learning problem]] The Learning Problem The learning problem proposed by Vapnik:1 Given a sample: $\{z_i\}$ in the probability space $Z$; Assuming a probability measure on the probability space $Z$; Assuming a set of functions $Q(z, \alpha)$ (e.g. loss functions), where $\alpha$ is a set of parameters; A risk functional to be minimized by tunning “the handles” $\alpha$, $R(\alpha)$. The risk functional is $$ R(\alpha) = \int Q(z, \alpha) \,\mathrm d F(z). $$ A learning problem … may lead to overfitting since ERM only selects the model to fit the train data well.

Pages: 8

ERM: Empirical Risk Minimization

Published: 2021-02-18

Category: { Machine Learning::Theories }

Tags:

#Data #Loss

References: - Murphy, K. P. (2012). Probabilistic Machine Learning: An Introduction.

Summary: In a [[learning problem]] The Learning Problem The learning problem proposed by Vapnik:1 Given a sample: $\{z_i\}$ in the probability space $Z$; Assuming a probability measure on the probability space $Z$; Assuming a set of functions $Q(z, \alpha)$ (e.g. loss functions), where $\alpha$ is a set of parameters; A risk functional to be minimized by tunning “the handles” $\alpha$, $R(\alpha)$. The risk functional is $$ R(\alpha) = \int Q(z, \alpha) \,\mathrm d F(z). $$ A learning problem is the minimization of this risk. Vapnik2000 … , empirical risk $R$ is a measurement the goodness of fit based on empirical information. Empirical risk minimization minimizes the empirical risk to select a good model $\hat f$ out of all possible models $f$ in our hypothesis space for a dataset $\mathcal D$,

Pages: 8

The Learning Problem

Published: 2021-05-06

Category: { Machine Learning::Theories }

Tags:

#Learning Theory

References: - Vladimir N. Vapnik. The Nature of Statistical Learning Theory. 2000. doi:10.1007/978-1-4757-3264-1

Summary: The learning problem proposed by Vapnik:1 Given a sample: $\{z_i\}$ in the probability space $Z$; Assuming a probability measure on the probability space $Z$; Assuming a set of functions $Q(z, \alpha)$ (e.g. loss functions), where $\alpha$ is a set of parameters; A risk functional to be minimized by tunning “the handles” $\alpha$, $R(\alpha)$. The risk functional is $$ R(\alpha) = \int Q(z, \alpha) \,\mathrm d F(z). $$ A learning problem is the minimization of this risk. Vapnik2000 Vladimir N. Vapnik. The Nature of Statistical Learning Theory. 2000. doi:10.1007/978-1-4757-3264-1 ↩︎

Pages: 8

Cross Validation

Published: 2021-05-06

Category: { Machine Learning::Theories }

Tags:

#Learning Theory #Cross Validation

References: - Murphy, K. P. (2012). Probabilistic Machine Learning: An Introduction.

Summary: Cross validation is a method to estimate the [[risk]] The Learning Problem The learning problem proposed by Vapnik:1 Given a sample: $\{z_i\}$ in the probability space $Z$; Assuming a probability measure on the probability space $Z$; Assuming a set of functions $Q(z, \alpha)$ (e.g. loss functions), where $\alpha$ is a set of parameters; A risk functional to be minimized by tunning “the handles” $\alpha$, $R(\alpha)$. The risk functional is $$ R(\alpha) = \int Q(z, \alpha) \,\mathrm d F(z). $$ A learning problem is the minimization of this risk. Vapnik2000 … . To perform cross validation, we split the train dataset $\mathcal D$ into $k$ folds, with each fold denoted as $\mathcal D_k$.

Pages: 8

Noise Contrastive Estimation: NCE

Published: 2021-08-13

Category: { Machine Learning::Theories }

Tags:

#Learning Theory

References: - Liu X, Zhang F, Hou Z, Wang Z, Mian L, Zhang J, et al. Self-supervised Learning: Generative or Contrastive. arXiv [cs.LG]. 2020. Available: http://arxiv.org/abs/2006.08218

Summary: Noise contrastive estimation (NCE) objective function is1 $$ \mathcal L = \mathbb E_{x, x^{+}, x^{-}} \left[ - \ln \frac{ C(x, x^{+})}{ C(x,x^{+}) + C(x,x^{-}) } \right], $$ where $x^{+}$ represents data similar to $x$, $x^{-}$ represents data dissimilar to $x$, $C(\cdot, \cdot)$ is a function to compute the similarities. For example, we can use $$ C(x, x^{+}) = e^{ f(x)^T f(x^{+}) }, $$ so that the objective function becomes $$ \mathcal L = \mathbb E_{x, x^{+}, x^{-}} \left[ - \ln \frac{ e^{ f(x)^T f(x^{+}) } }{ e^{ f(x)^T f(x^{+}) } + e^{ f(x)^T f(x^{-}) } } \right]. $$ The function $f(\cdot)$ can be an encoder.

Pages: 8

Shatter

Published: 2021-10-27

Category: { Machine Learning::Theories }

Tags:

#Learning Theory #Set #Shatter

References: - Shattered Set @ Wikipedia

Summary: Given a set $\mathcal S$, and a class (collection of sets) $\mathcal H$. For any subset of $\mathcal S$, denoted as $\mathcal s$, if we have an element of class $\mathcal H$, denoted as $\mathcal h$, that leads to1 $$ \mathcal h \cap \mathcal S = \mathcal s. $$ Since the power set of $\mathcal S$ ($P(\mathcal S)$) contains all the possible subsets of $\mathcal S$, we can also rephrase the concept using power set. If we can find the power set $P(\mathcal S)$ by looking into intersections of elements $\mathcal h$ of $\mathcal H$ ($\mathcal h\in \mathcal H$), then we say $\mathcal H$ shatters $\mathcal S$ 1.

Pages: 8

Induction, Deduction, and Transduction

Published: 2022-04-19

Category: { Machine Learning::Theories }

Tags:

#Learning Theory #Induction #Deduction #Transduction

Summary:

Pages: 8