Differential Learning Rates in PyTorch

Using different learning rates in different layers of our artificial neural network.

PyTorch offers optimizer configuration for different learning rates in different layers.

Why Do We Do This

In some models, we need to treat the layers differently. For example, in transfer learning, we could fine tuning the pretrained layers using a tiny learning rate.

In the documentation of pytorch, we find that we can set optimizer parameters on a per-layer basis 1. The example from the documentation is

optim.SGD([
    {'params': model.base.parameters()},
    {'params': model.classifier.parameters(), 'lr': 1e-3}
], lr=1e-2, momentum=0.9)

Planted: by ;

L Ma (2021). 'Differential Learning Rates in PyTorch', Datumorphism, 11 April. Available at: https://datumorphism.leima.is/til/machine-learning/pytorch/pytorch-differential-learning-rates/.