Learning Rate

Finding a suitable learning rate for our model training is crucial.

A safe but time wasting option is to use search on a grid of parameters. However, there are smarter moves.

Karpathy’s Constant

An empirical learning rate $3^{-4}$ for Adms, aka, Karpathy's constant, was started as a tweet by Andrei Karpathy.

Smarter Method

A smarter method is to start with small learning rate and increase it on each mini-batch, then observe the loss vs learning rate (mini-batch in this case). The learning rate that leads to the greatest gradient decline is a suitable learning rate 1 2 3.

Figure taken from fastai docs

Figure taken from fastai docs

The code for this method can be found on the GitHub repo of fastai.

See Also

Differential Learning Rates in PyTorch
Using different learning rates in different layers of our artificial neural network.

Planted: by ;

L Ma (2021). 'Learning Rate', Datumorphism, 11 April. Available at: https://datumorphism.leima.is/cards/machine-learning/practice/learning-rate/.