The basics for building and training models are contained in this module.
%load_ext autoreload
%autoreload 2
%matplotlib inline
# Used in notebook but not needed in package.
import numpy as np
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader

from htools import assert_raises

Optimizers

Optimizers like Adam or RMSProp can contain multiple "parameter groups", each with a different learning rate. (Other hyperparameters can vary as well, but we ignore that for now.) The functions below allow us to get a new optimizer or update an existing one. It allows us to easily use differential learning rate, but that is not required: it can also use the same LR for each parameter group.

variable_lr_optimizer[source]

variable_lr_optimizer(model, lr=0.003, lr_mult=1.0, optimizer='Adam', eps=0.001, **kwargs)

Get an optimizer that uses different learning rates for different layer
groups. Additional keyword arguments can be used to alter momentum and/or
weight decay, for example, but for the sake of simplicity these values
will be the same across layer groups.

Parameters
-----------
model: nn.Module
    A model object. If you intend to use differential learning rates,
    the model must have an attribute `groups` containing a ModuleList of
    layer groups in the form of Sequential objects. The number of layer
    groups must match the number of learning rates passed in.
lr: float, Iterable[float]
    A number of list of numbers containing the learning rates to use for
    each layer group. There should generally be one LR for each layer group
    in the model. If fewer LR's are provided, lr_mult will be used to
    compute additional LRs. See `update_optimizer` for details.
lr_mult: float
    If you pass in fewer LRs than layer groups, `lr_mult` will be used to
    compute additional learning rates from the one that was passed in.
optimizer: torch optimizer
    The Torch optimizer to be created (Adam by default).
eps: float
    Hyperparameter used by optimizer. The default of 1e-8 can lead to
    exploding gradients, so we typically override this.

Examples
---------
optim = variable_lr_optimizer(model, lrs=[3e-3, 3e-2, 1e-1])

update_optimizer[source]

update_optimizer(optim, lrs, lr_mult=1.0)

Pass in 1 or more learning rates, 1 for each layer group, and update the
optimizer accordingly. The optimizer is updated in place so nothing is
returned.

Parameters
----------
optim: torch.optim
    Optimizer object.
lrs: float, Iterable[float]
    One or more learning rates. If using multiple values, usually the
    earlier values will be smaller and later values will be larger. This
    can be achieved by passing in a list of LRs that is the same length as
    the number of layer groups in the optimizer, or by passing in a single
    LR and a value for lr_mult.
lr_mult: float
    If you pass in fewer LRs than layer groups, `lr_mult` will be used to
    compute additional learning rates from the one that was passed in.

Returns
-------
None

Examples
--------
If optim has 3 layer groups, this will result in LRs of [3e-5, 3e-4, 3e-3]
in that order:
update_optimizer(optim, lrs=3e-3, lr_mult=0.1)

Again, optim has 3 layer groups. We leave the default lr_mult of 1.0 so
each LR will be 3e-3.
update_optimizer(optim, lrs=3e-3)

Again, optim has 3 layer groups. 3 LRs are passed in so lr_mult is unused.
update_optimizer(optim, lrs=[1e-3, 1e-3, 3e-3])