Learn how to set up and customize the Trainer using a YAML configuration file.
Trainer
class using a YAML file.
This page walks you through how to create and customize a YAML configuration for the Trainer. By the end, you’ll understand the key parameters and be ready to write your own configuration files from scratch.
trainer
key.
trainer
accepts the following subkeys:
init
key is used to specify the arguments to the Trainer’s constructor as key-value pairs, where the key is the argument name and the value is the argument value.
Below are all of the accepted keys alongside YAML examples and their equivalent Python counterparts:
"CSX"
, "CPU"
, or "GPU"
.
cerebras.pytorch.backend
instance.
backend
argument is mutually exclusive with device
. The functionality it provides is a strict superset of the functionality provided by device
. To use a certain backend with all default parameters, you may specify device
. To configure anything about the backend, you must specify those parameters via the backend
key.To learn more about the backend argument, you can check out Trainer Backend.model_dir
include (but are not limited to):
Module
to train/validate using the constructed Trainer. All subkeys are passed as arguments to the model class.
Optimizer
to use to optimize the model’s weights during training.
The value at this key is expected to be a dictionary. This dictionary is expected to contain a single key that specifies the name of the Cerebras optimizer to construct. That is to say, it must be the name of a subclass of Optimizer
subclasses that come packaged in cerebras.pytorch
)
The value of the Optimizer name key is expected to be dictionary of key-value pairs that correspond to the arguments of the optimizer subclass.
Scheduler
instances to use during the training run.
The value at this key is expected to be a dictionary or a list of dictionaries. Each dictionary is expected to have a single key specifying the name of the Scheduler
to use.
The corresponding value of the Scheduler name key is expected to be mapping of key-value pairs that are passed as keyword arguments to the Scheduler.
Precision
.
Today, the only supported Precision
type is MixedPrecision
.
So, the value of the precision
key is expected to be a dictionary corresponding to the arguments of MixedPrecision
.
SparsityAlgorithm
to use to sparsity the model’s weights and optimizer state.
The value at this key is expected to be a dictionary. At a minimum, this dictionary is expected to contain an algorithm
key that specifies the name of the sparsity algorithm to apply as well as a sparsity
that specifies the level of sparsity to apply.
TrainingLoop
instance that specifies how many steps to train and validate for.
Checkpoint
instance that specifies how frequently the trainer should save checkpoints during training.
Logging
instance that configures the Python logger as well as specify how frequently the trainer should be writing logs.
Callback
class.
Each dictionary is expected to have a single key specifying the name of the Callback
The value at this key are passed in as keyword arguments to the subclass’s constructor.
Logger
class.
Each dictionary is expected to have a single key specifying the name of the Logger
.
The value at this key are passed in as keyword arguments to the subclass’s constructor.
torch.manual_seed
.
fit
key is used to specify the arguments to the Trainer
method.
The arguments are passed as key-value pairs, where the key is the argument name and the value is the argument value.
Below are all of the accepted keys alongside YAML examples and their equivalent Python counterparts:
data_processor
key which specifies the name of the data processors to use.
All other key-values in the dictionary are passed as argument to cerebras.pytorch.utils.data.DataLoader
.
eval_steps
every eval_frequency
training steps.
This value at this key is expected to be a dictionary or a list of dictionaries. Each dictionary is expected to contain at a minimum the data_processor
key which specifies the name of the data processors to use.
All other key-values in the dictionary are passed as argument to cerebras.pytorch.utils.data.DataLoader
.
validate
method.
The arguments are passed as key-value pairs, where the key is the argument name and the value is the argument value.
Below are all of the accepted keys alongside YAML examples and their equivalent Python counterparts:
data_processor
key which specifies the name of the data processors to use.
All other key-values in the dictionary are passed as argument to cerebras.pytorch.utils.data.DataLoader
.
validate_all
method.
The arguments are passed as key-value pairs, where the key is the argument name and the value is the argument value.
Below are all of the accepted keys alongside YAML examples and their equivalent Python counterparts:
data_processor
key which specifies the name of the data processors to use.
All other key-values in the dictionary are passed as argument to cerebras.pytorch.utils.data.DataLoader
.
validate_all
is doing
Trainer
class. The training scripts in Model Zoo will detect if you’ve passed in a legacy configuration and will automatically invoke a converter tool before constructing and using the trainer.
However, if you’d like to manually convert a legacy YAML or learn how to specify a specific parameter in the Trainer YAML, learn more in Convert Legacy to Trainer YAML.
Trainer
in some core workflows, you can check out Pretraining with Upstream Validation.
To learn more about how you can extend the capabilities of the Trainer
class, you can check out Trainer Components.