# PAC: Probably Approximately Correct

Published:
Category: { Machine Learning::Theories }
Tags:
Summary:
Pages: 8

# SRM: Structural Risk Minimization

Published:
Category: { Machine Learning::Theories }
Tags:
Summary: [[ERM]] ERM: Empirical Risk Minimization In a [[learning problem]] The Learning Problem The learning problem posed by Vapnik:1 Given a sample: $\{z_i\}$ in the probability space $Z$; Assuming a probability measure on the probability space $Z$; Assuming a set of functions $Q(z, \alpha)$ (e.g. loss functions), where $\alpha$ is a set of parameters; A risk functional to be minimized by tunning “the handles” $\alpha$, $R(\alpha)$. The risk functional is $$R(\alpha) = \int Q(z, \alpha) \,\mathrm d F(z).$$ A learning problem is … may lead to overfitting since ERM only selects the model to fit the train data well.
Pages: 8

# ERM: Empirical Risk Minimization

Published:
Category: { Machine Learning::Theories }
Tags:
Summary: In a [[learning problem]] The Learning Problem The learning problem posed by Vapnik:1 Given a sample: $\{z_i\}$ in the probability space $Z$; Assuming a probability measure on the probability space $Z$; Assuming a set of functions $Q(z, \alpha)$ (e.g. loss functions), where $\alpha$ is a set of parameters; A risk functional to be minimized by tunning “the handles” $\alpha$, $R(\alpha)$. The risk functional is $$R(\alpha) = \int Q(z, \alpha) \,\mathrm d F(z).$$ A learning problem is the minimization of this risk. Vapnik2000 … , empirical risk $R$ is a measurement the goodness of fit based on empirical information. Empirical risk minimization minimizes the empirical risk to select a good model $\hat f$ out of all possible models $f$ in our hypothesis space for a dataset $\mathcal D$,
Pages: 8

# The Learning Problem

Published:
Category: { Machine Learning::Theories }
Summary: The learning problem posed by Vapnik:1 Given a sample: $\{z_i\}$ in the probability space $Z$; Assuming a probability measure on the probability space $Z$; Assuming a set of functions $Q(z, \alpha)$ (e.g. loss functions), where $\alpha$ is a set of parameters; A risk functional to be minimized by tunning “the handles” $\alpha$, $R(\alpha)$. The risk functional is $$R(\alpha) = \int Q(z, \alpha) \,\mathrm d F(z).$$ A learning problem is the minimization of this risk. Vapnik2000 Vladimir N. Vapnik. The Nature of Statistical Learning Theory. 2000. doi:10.1007/978-1-4757-3264-1  ↩︎
Pages: 8

# Cross Validation

Published:
Category: { Machine Learning::Theories }
Summary: Cross validation is a method to estimate the [[risk]] The Learning Problem The learning problem posed by Vapnik:1 Given a sample: $\{z_i\}$ in the probability space $Z$; Assuming a probability measure on the probability space $Z$; Assuming a set of functions $Q(z, \alpha)$ (e.g. loss functions), where $\alpha$ is a set of parameters; A risk functional to be minimized by tunning “the handles” $\alpha$, $R(\alpha)$. The risk functional is $$R(\alpha) = \int Q(z, \alpha) \,\mathrm d F(z).$$ A learning problem is the minimization of this risk. Vapnik2000 … . To perform cross validation, we split the train dataset $\mathcal D$ into $k$ folds, with each fold denoted as $\mathcal D_k$.
Pages: 8

# Noise Contrastive Estimation: NCE

Published:
Category: { Machine Learning::Theories }
Summary: Noise contrastive estimation (NCE) objective function is1 $$\mathcal L = \mathbb E_{x, x^{+}, x^{-}} \left[ - \ln \frac{ C(x, x^{+})}{ C(x,x^{+}) + C(x,x^{-}) } \right],$$ where $x^{+}$ represents data similar to $x$, $x^{-}$ represents data dissimilar to $x$, $C(\cdot, \cdot)$ is a function to compute the similarities. For example, we can use $$C(x, x^{+}) = e^{ f(x)^T f(x^{+}) },$$ so that the objective function becomes $$\mathcal L = \mathbb E_{x, x^{+}, x^{-}} \left[ - \ln \frac{ e^{ f(x)^T f(x^{+}) } }{ e^{ f(x)^T f(x^{+}) } + e^{ f(x)^T f(x^{-}) } } \right].$$ The function $f(\cdot)$ can be an encoder.
Pages: 8

# Shatter

Published:
Category: { Machine Learning::Theories }
Tags:
References:
Summary: Given a set $\mathcal S$, and a class (collection of sets) $\mathcal H$. For any subset of $\mathcal S$, denoted as $\mathcal s$, if we have an element of class $\mathcal H$, denoted as $\mathcal h$, that leads to1 $$\mathcal h \cap \mathcal S = \mathcal s.$$ Since the power set of $\mathcal S$ ($P(\mathcal S)$) contains all the possible subsets of $\mathcal S$, we can also rephrase the concept using power set. If we can find the power set $P(\mathcal S)$ by looking into intersections of elements $\mathcal h$ of $\mathcal H$ ($\mathcal h\in \mathcal H$), then we say $\mathcal H$ shatters $\mathcal S$ 1.
Pages: 8

# Induction, Deduction, and Transduction

Published:
Category: { Machine Learning::Theories }
Summary:
Pages: 8