Trainer
with a Optimizer
and with one or more Scheduler
classes. By the end you should have a cursory understanding on how to use the Optimizer
class and Scheduler
class in conjunction with the Trainer
class.
Optimizer
implements an optimization algorithm to control how model parameters are updated. Various hyperparameters such as lr
, momentum
, and weight_decay
can be passed to the Optimizer
to give further control. A Scheduler
is used in conjunction with an Optimizer
to adjust the value of these hyperparameters over the course of a run. Currently, schedulers for lr
and weight_decay
are supported.
The Trainer
takes in an optimizer
argument. An optimizer is used to optimize model weights during training and is required for any run that does any training. optimizer
can be passed as an Optimizer
class. For details on all available optimizers, see the CSTorch optimizer class.
The Trainer
also accepts a schedulers
argument. Schedulers are used to adjust hyperparameters during training. Typically this adjustment is a decay following some algorithm. The CSTorch API supports schedulers that adjust either learning rate or weight decay. For a full list of available schedulers see CSTorch scheduler class.
In the example below, you create an SGD optimizer with a single SequentialLR Scheduler that is a LinearLR Scheduler for the first 500 steps, then a CosineDecayLR Scheduler for the next 500 steps.
optimizer
is passed as a callable, assumed to be a function that takes in a torch.nn.Module and returns a Optimizer
. It can also be passed as an Optimizer
provided the model
is already defined.Similarly schedulers
is passed as a list of callables, where each element is assumed to be a function that takes in a Optimizer
and returns a Scheduler
. It can also be passed as an Scheduler
provided the Optimizer
is already defined.Using callables allows us to pass in objects without having to predefine inputs to that object.param_groups
based on glob-like patterns and on the scheduler-side by specifying which tagged groups to update.
param_groups
which is a list of dictionaries containing all parameters. For more information see the PyTorch documentation.
Modelzoo has the ability to tag optimizer param_groups
based on glob-like pattern matching of parameter names. These tagged param_groups
can then be used to selectively adjust specific parameters.
Parameters are partitioned and tagged via YAML. For example:
"bias"
into one group with the tag "bias_params"
. All remaining parameters would be in another group with no tags.
For cases where multiple filters are specified and target overlapping subsets, param_groups
will be partitioned into all unique combinations of tags.
For example, if you had parameters named:
param_groups
. This may affect the length of param_groups
however the placement of "tags"
will still be correctly preserved. See configure_param_groups
for more details.param_group_tags
argument, individual schedulers can be configured to only target specific optimizer param_groups
. For example:
param_groups
that have the "tag1"
tag.
These tags can be added to param_groups
manually but the most common use case is in conjunction with optimizer tagging.
CosineDecayWD
scheduler would only adjust the weight decay of parameters whose names end in "bias"
.
Optimizer
and the Scheduler
in conjunction with the Trainer
. By this point, you should have a cursory understanding of how to construct and configure a Optimizer
and Scheduler
inside a Trainer
instance.
Trainer
, see Model Zoo Trainer - Checkpoint.
Trainer
instance using a YAML configuration file, you can check out:
Trainer
in some core workflows, you can check out:
To learn more about how you can extend the capabilities of the Trainer
class, you can check out: