Differential Learning Rates in PyTorch
Using different learning rates in different layers of our artificial neural network.
PyTorch offers optimizer configuration for different learning rates in different layers.
In the documentation of pytorch, we find that we can set optimizer parameters on a per-layer basis 1. The example from the documentation is
optim.SGD([
{'params': model.base.parameters()},
{'params': model.classifier.parameters(), 'lr': 1e-3}
], lr=1e-2, momentum=0.9)
Planted:
by L Ma;
References:
Dynamic Backlinks to
til/machine-learning/pytorch/pytorch-differential-learning-rates
:til/machine-learning/pytorch/pytorch-differential-learning-rates
Links to:L Ma (2021). 'Differential Learning Rates in PyTorch', Datumorphism, 11 April. Available at: https://datumorphism.leima.is/til/machine-learning/pytorch/pytorch-differential-learning-rates/.