Why is learning from data even possible? To discuss this problem, we need a framework for learning. Operationally, we can think of learning as the following framework1.

## Naive View

Naively speaking, a model should have two key properties,

1. enough capacity to hold the necessary information embedded in the data, and
2. a method to find the combination of parameters so that the model can generate/complete new data.

Most neural networks have enough capacity to hold the necessary information in the data2. The problem is, the capacity is so large. Why does backprop even work? How did backprop find a suitable set of parameters that can generalize?

Planted: by ;

No backlinks identified. Reference this note using the Note ID wiki/learning-theory/feasibility-of-learning.md in other notes to connect them.

L Ma (2021). 'Feasibility of Learning', Datumorphism, 10 April. Available at: https://datumorphism.leima.is/wiki/learning-theory/feasibility-of-learning/.