Differential Learning Rates in PyTorch

Using different learning rates in different layers of our artificial neural network.

PyTorch offers optimizer configuration for different learning rates in different layers.

Why Do We Do This

In some models, we need to treat the layers differently. For example, in transfer learning, we could fine tuning the pretrained layers using a tiny learning rate.

In the documentation of pytorch, we find that we can set optimizer parameters on a per-layer basis 1. The example from the documentation is

optim.SGD([
    {'params': model.base.parameters()},
    {'params': model.classifier.parameters(), 'lr': 1e-3}
], lr=1e-2, momentum=0.9)

  1. PyTorchDocs torch.optim — PyTorch 1.10.0 documentation. [cited 30 Nov 2021]. Available: https://pytorch.org/docs/stable/optim.html  ↩︎

Planted: by ;

No backlinks identified. Reference this note using the Note ID til/machine-learning/pytorch/pytorch-differential-learning-rates.md in other notes to connect them.

L Ma (2021). 'Differential Learning Rates in PyTorch', Datumorphism, 11 April. Available at: https://datumorphism.leima.is/til/machine-learning/pytorch/pytorch-differential-learning-rates/.