Learning Rate

#Machine Learning #Deep Learning #Basics #Learning Rate

Finding a suitable learning rate for our model training is crucial.

A safe but time wasting option is to use search on a grid of parameters. However, there are smarter moves.

Karpathy’s Constant

An empirical learning rate $3^{-4}$ for Adms, aka, Karpathy's constant, was started as a tweet by Andrei Karpathy.

3e-4 is the best learning rate for Adam, hands down.
— Andrej Karpathy (@karpathy) November 24, 2016

(i just wanted to make sure that people understand that this is a joke...)
— Andrej Karpathy (@karpathy) November 24, 2016

Smarter Method

A smarter method is to start with small learning rate and increase it on each mini-batch, then observe the loss vs learning rate (mini-batch in this case). The learning rate that leads to the greatest gradient decline is a suitable learning rate ¹ ² ³.

The code for this method can be found on the GitHub repo of fastai.

Smarter Method

See Also