Differential Learning Rates in PyTorch

Using different learning rates in different layers of our artificial neural network.

PyTorch offers optimizer configuration for different learning rates in different layers.

Why Do We Do This

In some models, we need to treat the layers differently. For example, in transfer learning, we could fine tuning the pretrained layers using a tiny learning rate.

In the documentation of pytorch, we find that we can set optimizer parameters on a per-layer basis ¹. The example from the documentation is

optim.SGD([
    {'params': model.base.parameters()},
    {'params': model.classifier.parameters(), 'lr': 1e-3}
], lr=1e-2, momentum=0.9)

PyTorchDocs torch.optim — PyTorch 1.10.0 documentation. [cited 30 Nov 2021]. Available: https://pytorch.org/docs/stable/optim.html ↩︎

Planted: 2021-11-01 by L Ma;

References:

Dynamic Backlinks to til/machine-learning/pytorch/pytorch-differential-learning-rates:

Differential Learning Rates in PyTorch

Using different learning rates in different layers of our artificial neural network.

til/machine-learning/pytorch/pytorch-differential-learning-rates Links to:

Learning Rate

Find a good learning rate

L Ma (2021). 'Differential Learning Rates in PyTorch', Datumorphism, 11 April. Available at: https://datumorphism.leima.is/til/machine-learning/pytorch/pytorch-differential-learning-rates/.