# Learning Theories

### Introduction: My Knowledge Cards

# PAC: Probably Approximately Correct

Published: 2020-01-16

Category: { Machine Learning::Theories }

Tags:

Summary:

Pages: 8

# SRM: Structural Risk Minimization

Published: 2021-02-18

Category: { Machine Learning::Theories }

Tags:

References:
- Structural risk minimization @ Wikipedia
- Murphy, K. P. (2012). Probabilistic Machine Learning: An Introduction.

Summary: [[ERM]] ERM: Empirical Risk Minimization In a [[learning problem]] The Learning Problem The learning problem posed by Vapnik:1 Given a sample: $\{z_i\}$ in the probability space $Z$; Assuming a probability measure on the probability space $Z$; Assuming a set of functions $Q(z, \alpha)$ (e.g. loss functions), where $\alpha$ is a set of parameters; A risk functional to be minimized by tunning “the handles” $\alpha$, $R(\alpha)$. The risk functional is $$ R(\alpha) = \int Q(z, \alpha) \,\mathrm d F(z). $$ A learning problem is … may lead to overfitting since ERM only selects the model to fit the train data well.

Pages: 8

# ERM: Empirical Risk Minimization

Published: 2021-02-18

Category: { Machine Learning::Theories }

Tags:

Summary: In a [[learning problem]] The Learning Problem The learning problem posed by Vapnik:1 Given a sample: $\{z_i\}$ in the probability space $Z$; Assuming a probability measure on the probability space $Z$; Assuming a set of functions $Q(z, \alpha)$ (e.g. loss functions), where $\alpha$ is a set of parameters; A risk functional to be minimized by tunning “the handles” $\alpha$, $R(\alpha)$. The risk functional is $$ R(\alpha) = \int Q(z, \alpha) \,\mathrm d F(z). $$ A learning problem is the minimization of this risk. Vapnik2000 … , empirical risk $R$ is a measurement the goodness of fit based on empirical information. Empirical risk minimization minimizes the empirical risk to select a good model $\hat f$ out of all possible models $f$ in our hypothesis space for a dataset $\mathcal D$,

Pages: 8

# The Learning Problem

Published: 2021-05-06

Category: { Machine Learning::Theories }

Tags:

References:
- Vladimir N. Vapnik. The Nature of Statistical Learning Theory. 2000. doi:10.1007/978-1-4757-3264-1

Summary: The learning problem posed by Vapnik:1
Given a sample: $\{z_i\}$ in the probability space $Z$; Assuming a probability measure on the probability space $Z$; Assuming a set of functions $Q(z, \alpha)$ (e.g. loss functions), where $\alpha$ is a set of parameters; A risk functional to be minimized by tunning “the handles” $\alpha$, $R(\alpha)$. The risk functional is
$$ R(\alpha) = \int Q(z, \alpha) \,\mathrm d F(z). $$
A learning problem is the minimization of this risk.
Vapnik2000 Vladimir N. Vapnik. The Nature of Statistical Learning Theory. 2000. doi:10.1007/978-1-4757-3264-1 ↩︎

Pages: 8

# Cross Validation

Published: 2021-05-06

Category: { Machine Learning::Theories }

Tags:

Summary: Cross validation is a method to estimate the [[risk]] The Learning Problem The learning problem posed by Vapnik:1 Given a sample: $\{z_i\}$ in the probability space $Z$; Assuming a probability measure on the probability space $Z$; Assuming a set of functions $Q(z, \alpha)$ (e.g. loss functions), where $\alpha$ is a set of parameters; A risk functional to be minimized by tunning “the handles” $\alpha$, $R(\alpha)$. The risk functional is $$ R(\alpha) = \int Q(z, \alpha) \,\mathrm d F(z). $$ A learning problem is the minimization of this risk. Vapnik2000 … .
To perform cross validation, we split the train dataset $\mathcal D$ into $k$ folds, with each fold denoted as $\mathcal D_k$.

Pages: 8

# Noise Contrastive Estimation: NCE

Published: 2021-08-13

Category: { Machine Learning::Theories }

Tags:

Summary: Noise contrastive estimation (NCE) objective function is1
$$ \mathcal L = \mathbb E_{x, x^{+}, x^{-}} \left[ - \ln \frac{ C(x, x^{+})}{ C(x,x^{+}) + C(x,x^{-}) } \right], $$
where
$x^{+}$ represents data similar to $x$, $x^{-}$ represents data dissimilar to $x$, $C(\cdot, \cdot)$ is a function to compute the similarities. For example, we can use
$$ C(x, x^{+}) = e^{ f(x)^T f(x^{+}) }, $$
so that the objective function becomes
$$ \mathcal L = \mathbb E_{x, x^{+}, x^{-}} \left[ - \ln \frac{ e^{ f(x)^T f(x^{+}) } }{ e^{ f(x)^T f(x^{+}) } + e^{ f(x)^T f(x^{-}) } } \right]. $$
The function $f(\cdot)$ can be an encoder.

Pages: 8

# Shatter

Published: 2021-10-27

Category: { Machine Learning::Theories }

Tags:

References:
- Shattered Set @ Wikipedia

Summary: Given a set $\mathcal S$, and a class (collection of sets) $\mathcal H$.
For any subset of $\mathcal S$, denoted as $\mathcal s$, if we have an element of class $\mathcal H$, denoted as $\mathcal h$, that leads to1
$$ \mathcal h \cap \mathcal S = \mathcal s. $$
Since the power set of $\mathcal S$ ($P(\mathcal S)$) contains all the possible subsets of $\mathcal S$, we can also rephrase the concept using power set. If we can find the power set $P(\mathcal S)$ by looking into intersections of elements $\mathcal h$ of $\mathcal H$ ($\mathcal h\in \mathcal H$), then we say $\mathcal H$ shatters $\mathcal S$ 1.

Pages: 8

# Induction, Deduction, and Transduction

Published: 2022-04-19

Category: { Machine Learning::Theories }

Tags:

Summary:

Pages: 8