Machine Learning Overview
#Statistical Learning #Machine Learning #Basics
What is Machine Learning
AbuMostafa, MagdonIsmail, and Lin summarized machine learning problem using the following chart ^{1} ^{2}. Utilimately, we need to find an approximation $g$ of the true map $f$ from features $\mathcal X$ to targets $\mathcal Y$ on a specific probability distribution of features $P$. This process is done by using an algorithm to select some hypothesis that works.
In the core of machine learning models, we have three components^{3}:
 Representation: encode data and problem representation, i.e., propose a space to set a stage.
 Evaluation: an objective function to be evaluated that guides the model.
 Optimization: an algorithm to optimize the model so it learns what we want it to do.
Machine Learning Workflow
There are many objectives in machine learning. Two of the most applied objectives are classifications and regressions. In classifications and regression, the following four factors are relevant.
 Input:
 Domain knowledge $\tilde{\mathscr K_D}$.
 on features,
 on target values,
 on relation between features and target values.
 A dataset $\tilde{\mathscr D}(\tilde{\mathbf X}, \tilde{\mathbf Y})$ with $\tilde{\mathbf X}$ being the features and $\tilde{\mathbf Y}$ being the values to be predicted;
 features (domain set): $\tilde{\mathbf X}$,
 target values (label set): $\tilde{\mathbf Y}$.
 relations between features and target values: $f(\mathbf X) \to \mathbf Y$.
 Domain knowledge $\tilde{\mathscr K_D}$.
 A set of “encoders” $\mathscr T_i$ that maps the features $\tilde{\mathbf X}$ into machinereadable new features $\mathbf X$ and predicting values $\tilde{\mathbf y}$ into machine readable new values $\mathbf y$. The dimensions of $\tilde{\mathbf X}$ and $\mathbf X$ may not be the same. In summary, $\mathscr T(\tilde{\mathscr D}) \to \mathscr D$.
 A model (aka, prediction rule, predictor, hypothesis) $h(\mathbf X;\mathbf \theta)\to \bar{\mathbf Y}$ that maps $\mathbf X$ to the values with $\mathbf X$ being a set of input features. $h$ may also be a set of functions.
 A measurement of the model performance, $L_{f, \mathscr D}(h)$.
 Error of model: $L_{f, \mathscr D}(h) = \mathscr L(h(\mathbf X), f(\mathbf X))$, where $\mathscr L$ is distance operator.

AbuMostafa2012 AbuMostafa, Yaser S and MagdonIsmail, Malik and Lin, HsuanTien. Learning from Data. 2012. Available: https://www.semanticscholar.org/paper/LearningFromDataAbuMostafaMagdonIsmail/1c0ed9ed3201ef381cc392fc3ca91cae6ecfc698 ↩︎

Deckert2017 Deckert DA. Advanced Topics in Machine Learning. In: Advanced Topics in Machine Learning [Internet]. Apr 2017 [cited 17 Oct 2021]. Available: https://www.mathematik.unimuenchen.de/~deckert/teaching/SS17/ATML/ ↩︎

Domingos2012 Domingos, P. (2012). A few useful things to know about machine learning. Communications of the ACM, 55 (10), 78–87. ↩︎
L Ma (2018). 'Machine Learning Overview', Datumorphism, 05 April. Available at: https://datumorphism.leima.is/wiki/machinelearning/overview/.
Table of Contents
References:
 Mehta, P., Bukov, M., Wang, C. H., Day, A. G. R., Richardson, C., Fisher, C. K., & Schwab, D. J. (2019). A highbias, lowvariance introduction to Machine Learning for physicists. Physics Reports, 810, 1–124.
 ShalevShwartz, S., & BenDavid, S. (2013). Understanding machine learning: From theory to algorithms. Understanding Machine Learning: From Theory to Algorithms
 Domingos2012 Domingos, P. (2012). A few useful things to know about machine learning. Communications of the ACM, 55 (10), 78–87.
 AbuMostafa2012 AbuMostafa, Yaser S and MagdonIsmail, Malik and Lin, HsuanTien. Learning from Data. 2012. Available: https://www.semanticscholar.org/paper/LearningFromDataAbuMostafaMagdonIsmail/1c0ed9ed3201ef381cc392fc3ca91cae6ecfc698
 Deckert2017 Deckert DA. Advanced Topics in Machine Learning. In: Advanced Topics in Machine Learning [Internet]. Apr 2017 [cited 17 Oct 2021]. Available: https://www.mathematik.unimuenchen.de/~deckert/teaching/SS17/ATML/
Current Ref:

wiki/machinelearning/overview.md