Learn how to configure the training and validation loops of the Trainer using two LoopCallback
subclasses.
loop
argument allows you to manage the training and/or validation loop.
The Trainer
takes in a LoopCallback
subclass that is used to configure loop options such as number of steps/epochs to run for and how often to run validation.
A LoopCallback
cannot be instantiated directly, TrainingLoop
or ValidationLoop
must be used instead.
TrainingLoop
callback is used to configure the Trainer
to run a fit
task. The majority of loop arguments reference step
. The step
is simply a batch of training/validation data.
Arguments
num_steps
: The total number of steps to train for.
max_steps
: The maximum number of global steps to train for. num_steps
supersedes this.
num_epochs
: The number of epochs to train for. Mutually exclusive with num_steps
.
steps_per_epoch
: The number of steps to train for in each epoch.
eval_frequency
: The frequency at which validation is performed. See LoopCallback
for more details on options.
eval_steps
: The number of validation steps to perform.
grad_accum_steps
: The number of steps to accumulate gradients before performing and optimizer step. Only relevant for "CPU"
and "GPU"
runs.
fit
), you must use a TrainingLoop
. If you plan on running only validation, you may use a ValidationLoop
.Trainer
to run for 1000 steps and run validation for 50 steps every 100 training steps.
ValidationLoop
callback is used to configure the Trainer
to run a validate
or validate_all
task.
Arguments
eval_steps
: The number of validation steps to perform.
hook
: The base name of the validation hooks to run. Used to extend validation functionality by implementing custom validation callbacks. See EleutherEvalHarnessLoop
for an example. Defaults to "validate"
.
ValidationLoop
can only be used if you plan on running only validation tasks (calling validate
or validate_all
). Otherwise, use TrainingLoop
.Trainer
to run validation for 100 steps. We do not need to set any training related options such as num_steps
or eval_frequency
since we are only running validation.
TrainingLoop
supports both training and validation because it instantiates a ValidationLoop
on initalization.