LoopCallback
subclasses and how to configure the training/validation loop of the Trainer
by using one of them.
loop
argument allows you to manage the training and/or validation loop. The Trainer
takes in a LoopCallback
subclass that is used to configure loop options such as number of steps/epochs to run for and how often to run validation. A LoopCallback
cannot be instantiated directly, TrainingLoop
or ValidationLoop
must be used instead.
TrainingLoop
callback is used to configure the Trainer
to run a fit
task. The majority of loop arguments reference step
. The step
is simply a batch of training/validation data.
Arguments
num_steps
: The total number of steps to train for.
max_steps
: The maximum number of global steps to train for. num_steps
supersedes this.
num_epochs
: The number of epochs to train for. Mutually exclusive with num_steps
.
steps_per_epoch
: The number of steps to train for in each epoch.
eval_frequency
: The frequency at which validation is performed. See LoopCallback
for more details on options.
eval_steps
: The number of validation steps to perform.
grad_accum_steps
: The number of steps to accumulate gradients before performing and optimizer step. Only relevant for "CPU"
and "GPU"
runs.
fit
), you must use a TrainingLoop
. If you plan on running only validation, you may use a ValidationLoop
.Trainer
to run for 1000 steps and run validation for 50 steps every 100 training steps.
ValidationLoop
callback is used to configure the Trainer
to run a validate
or validate_all
task.
Arguments
eval_steps
: The number of validation steps to perform.
hook
: The base name of the validation hooks to run. Used to extend validation functionality by implementing custom validation callbacks. See EleutherEvalHarnessLoop
for an example. Defaults to "validate"
.
ValidationLoop
can only be used if you plan on running only validation tasks (calling validate
or validate_all
). Otherwise, use TrainingLoop
.Trainer
to run validation for 100 steps. We do not need to set any training related options such as num_steps
or eval_frequency
since we are only running validation.
TrainingLoop
supports both training and validation because it instantiates a ValidationLoop
on initalization.Trainer
for training and/or validation. You should now understand how to use a LoopCallback
subclass to configure training loop parameters such as number of steps and validation frequency.
Trainer
in some core workflows, you can check out:
To learn more about how you can extend the capabilities of the Trainer
class, you can check out: