Finding a suitable learning rate for our model training is crucial.

A safe but time wasting option is to use search on a grid of parameters. However, there are smarter moves.

Karpathy’s Constant

An empirical learning rate $3^{-4}$ for Adms, aka, Karpathy's constant, was started as a tweet by Andrei Karpathy.

Smarter Method

A smarter method is to start with small learning rate and increase it on each mini-batch, then observe the loss vs learning rate (mini-batch in this case). The learning rate that leads to the greatest gradient decline is a suitable learning rate 1 2 3.

Figure taken from fastai docs

The code for this method can be found on the GitHub repo of fastai.

