Learning Theories
Introduction: My Knowledge Cards
PAC: Probably Approximately Correct
Published:
Category: { Machine Learning::Theories }
Tags:
Summary:
Pages: 8
SRM: Structural Risk Minimization
Published:
Category: { Machine Learning::Theories }
Tags:
References:
- Structural risk minimization @ Wikipedia
- Murphy, K. P. (2012). Probabilistic Machine Learning: An Introduction.
Summary: [[ERM]] ERM: Empirical Risk Minimization In a [[learning problem]] The Learning Problem The learning problem proposed by Vapnik:1 Given a sample: $\{z_i\}$ in the probability space $Z$; Assuming a probability measure on the probability space $Z$; Assuming a set of functions $Q(z, \alpha)$ (e.g. loss functions), where $\alpha$ is a set of parameters; A risk functional to be minimized by tunning “the handles” $\alpha$, $R(\alpha)$. The risk functional is $$ R(\alpha) = \int Q(z, \alpha) \,\mathrm d F(z). $$ A learning problem … may lead to overfitting since ERM only selects the model to fit the train data well.
Pages: 8
ERM: Empirical Risk Minimization
Published:
Category: { Machine Learning::Theories }
Tags:
Summary: In a [[learning problem]] The Learning Problem The learning problem proposed by Vapnik:1 Given a sample: $\{z_i\}$ in the probability space $Z$; Assuming a probability measure on the probability space $Z$; Assuming a set of functions $Q(z, \alpha)$ (e.g. loss functions), where $\alpha$ is a set of parameters; A risk functional to be minimized by tunning “the handles” $\alpha$, $R(\alpha)$. The risk functional is $$ R(\alpha) = \int Q(z, \alpha) \,\mathrm d F(z). $$ A learning problem is the minimization of this risk. Vapnik2000 … , empirical risk $R$ is a measurement the goodness of fit based on empirical information. Empirical risk minimization minimizes the empirical risk to select a good model $\hat f$ out of all possible models $f$ in our hypothesis space for a dataset $\mathcal D$,
Pages: 8
The Learning Problem
Published:
Category: { Machine Learning::Theories }
Tags:
References:
- Vladimir N. Vapnik. The Nature of Statistical Learning Theory. 2000. doi:10.1007/978-1-4757-3264-1
Summary: The learning problem proposed by Vapnik:1
Given a sample: $\{z_i\}$ in the probability space $Z$; Assuming a probability measure on the probability space $Z$; Assuming a set of functions $Q(z, \alpha)$ (e.g. loss functions), where $\alpha$ is a set of parameters; A risk functional to be minimized by tunning “the handles” $\alpha$, $R(\alpha)$. The risk functional is
$$ R(\alpha) = \int Q(z, \alpha) \,\mathrm d F(z). $$
A learning problem is the minimization of this risk.
Vapnik2000 Vladimir N. Vapnik. The Nature of Statistical Learning Theory. 2000. doi:10.1007/978-1-4757-3264-1 ↩︎
Pages: 8
Cross Validation
Published:
Category: { Machine Learning::Theories }
Tags:
Summary: Cross validation is a method to estimate the [[risk]] The Learning Problem The learning problem proposed by Vapnik:1 Given a sample: $\{z_i\}$ in the probability space $Z$; Assuming a probability measure on the probability space $Z$; Assuming a set of functions $Q(z, \alpha)$ (e.g. loss functions), where $\alpha$ is a set of parameters; A risk functional to be minimized by tunning “the handles” $\alpha$, $R(\alpha)$. The risk functional is $$ R(\alpha) = \int Q(z, \alpha) \,\mathrm d F(z). $$ A learning problem is the minimization of this risk. Vapnik2000 … .
To perform cross validation, we split the train dataset $\mathcal D$ into $k$ folds, with each fold denoted as $\mathcal D_k$.
Pages: 8
Noise Contrastive Estimation: NCE
Published:
Category: { Machine Learning::Theories }
Tags:
Summary: Noise contrastive estimation (NCE) objective function is1
$$ \mathcal L = \mathbb E_{x, x^{+}, x^{-}} \left[ - \ln \frac{ C(x, x^{+})}{ C(x,x^{+}) + C(x,x^{-}) } \right], $$
where
$x^{+}$ represents data similar to $x$, $x^{-}$ represents data dissimilar to $x$, $C(\cdot, \cdot)$ is a function to compute the similarities. For example, we can use
$$ C(x, x^{+}) = e^{ f(x)^T f(x^{+}) }, $$
so that the objective function becomes
$$ \mathcal L = \mathbb E_{x, x^{+}, x^{-}} \left[ - \ln \frac{ e^{ f(x)^T f(x^{+}) } }{ e^{ f(x)^T f(x^{+}) } + e^{ f(x)^T f(x^{-}) } } \right]. $$
The function $f(\cdot)$ can be an encoder.
Pages: 8
Shatter
Published:
Category: { Machine Learning::Theories }
Tags:
References:
- Shattered Set @ Wikipedia
Summary: Given a set $\mathcal S$, and a class (collection of sets) $\mathcal H$.
For any subset of $\mathcal S$, denoted as $\mathcal s$, if we have an element of class $\mathcal H$, denoted as $\mathcal h$, that leads to1
$$ \mathcal h \cap \mathcal S = \mathcal s. $$
Since the power set of $\mathcal S$ ($P(\mathcal S)$) contains all the possible subsets of $\mathcal S$, we can also rephrase the concept using power set. If we can find the power set $P(\mathcal S)$ by looking into intersections of elements $\mathcal h$ of $\mathcal H$ ($\mathcal h\in \mathcal H$), then we say $\mathcal H$ shatters $\mathcal S$ 1.
Pages: 8
Induction, Deduction, and Transduction
Published:
Category: { Machine Learning::Theories }
Tags:
Summary:
Pages: 8