Trainer Configuration

The Model Zoo includes helpful tools to simplify model training—one of which is the ability to configure the Trainer class using a YAML file. This page walks you through how to create and customize a YAML configuration for the Trainer. By the end, you’ll understand the key parameters and be ready to write your own configuration files from scratch.

Prerequisites

Please ensure that you have read through Trainer Essentials beforehand. The rest of this page assumes that you already have at least a cursory understanding of what the Trainer is and how to use the Python API.

Base Specification

The YAML specification is intentionally designed to map almost exactly one-to-one with the Trainer’s python API. The Trainer’s constructor can be specified via a YAML configuration file as follows:

trainer:
  init:
    device: "CSX"
    model_dir: "./model_dir"
    model:
      # The remaining arguments to the model class
      vocab_size: 1024
      max_position_embeddings: 1024
      ...
    optimizer:
      # Corresponds to cstorch.optim.SGD
      SGD:
        lr: 0.01
        momentum: 0.9
    loop:
      num_steps: 1000
      eval_steps: 100
      eval_frequency: 100
    checkpoint:
      steps: 100
  fit:
    train_dataloader:
      data_processor: GptHDF5MapDataProcessor
      data_dir: "/path/to/train/data"
      batch_size: 64
      ...
    val_dataloader:
      data_processor: GptHDF5MapDataProcessor
      data_dir: "/path/to/validation/data"
      batch_size: 64
      ...

Click the Python tab above to see the equivalent Python code for the YAML configuration. The YAML closely mirrors the Python API by design, making it easy to switch between the two if you’re familiar with either.

Config Parameters

The YAML specification starts with the top level trainer key.

trainer:
  ...

If this key is not present, then the configuration is not valid. The trainer accepts the following subkeys:

init

The init key is used to specify the arguments to the Trainer’s constructor as key-value pairs, where the key is the argument name and the value is the argument value. Below are all of the accepted keys alongside YAML examples and their equivalent Python counterparts:

device

The device to train the model on. If provided, it must be one of "CSX", "CPU", or "GPU".

trainer:
  init:
    device: "CSX"
    ...
  ...

backend

Configures the backend used to train the model. If provided, it is expected to be a dictionary whose keys will be used to construct a cerebras.pytorch.backend instance.

  trainer:
  init:
    backend:
      backend_type: "CSX"
      cluster_config:
        num_csx: 4
        mount_dirs:
        - /path/to/dir1
        - /path/to/dir2
        ...
      ...
    ...
  ...

The backend argument is mutually exclusive with device. The functionality it provides is a strict superset of the functionality provided by device. To use a certain backend with all default parameters, you may specify device. To configure anything about the backend, you must specify those parameters via the backend key.To learn more about the backend argument, you can check out Trainer Backend.

model_dir

The directory where the model artifacts are saved. Some of the artifacts that may be dumped into the model_dir include (but are not limited to):

Client-side logs
Checkpoints
TensorBoard event files
Tensor summaries
Validation results


trainer:
init:
  ...
  model_dir: "./model_dir"
  ...
...

model

Configures the Module to train/validate using the constructed Trainer. All subkeys are passed as arguments to the model class.

trainer:
init:
...
model:
  vocab_size: 1024
  max_position_embeddings: 1024
  ...
...
...

To learn more about the model argument, you can check out Trainer Model.

optimizer

Configures the Optimizer to use to optimize the model’s weights during training. The value at this key is expected to be a dictionary. This dictionary is expected to contain a single key that specifies the name of the Cerebras optimizer to construct. That is to say, it must be the name of a subclass of Optimizer subclasses that come packaged in cerebras.pytorch) The value of the Optimizer name key is expected to be dictionary of key-value pairs that correspond to the arguments of the optimizer subclass.

trainer:
init:
 ...
 optimizer:
   # Corresponds to cstorch.optim.SGD
   SGD:
     lr: 0.01
     momentum: 0.9
 ...
...

To learn more about the optimizer argument, you can check out Trainer Optimizer.

schedulers

Configures the Scheduler instances to use during the training run. The value at this key is expected to be a dictionary or a list of dictionaries. Each dictionary is expected to have a single key specifying the name of the Scheduler to use. The corresponding value of the Scheduler name key is expected to be mapping of key-value pairs that are passed as keyword arguments to the Scheduler.

The optimizer argument to the Scheduler is automatically passed in and thus is not required.

trainer:
init:
  ...
  schedulers:
  - LinearLR:
      initial_learning_rate: 0.01
      end_learning_rate: 0.001
      total_iters: 100
  ...
...

To learn more about the schedulers argument, you can check out Trainer Schedulers.

precision

Configures Precision. Today, the only supported Precision type is MixedPrecision. So, the value of the precision key is expected to be a dictionary corresponding to the arguments of MixedPrecision.

trainer:
init:
 ...
 precision:
   fp16_type: float16
   precision_opt_level: 1
   loss_scaling_factor: dynamic
   max_gradient_norm: 1.0
   ...
 ...
...

To learn more about the precision argument, you can check out Trainer Precision.

sparsity

Configures the SparsityAlgorithm to use to sparsity the model’s weights and optimizer state. The value at this key is expected to be a dictionary. At a minimum, this dictionary is expected to contain an algorithm key that specifies the name of the sparsity algorithm to apply as well as a sparsity that specifies the level of sparsity to apply.

trainer:
init:
 ...
 sparsity:
   algorithm: Static
   sparsity: 0.5
 ...
...

To learn more about how sparsity can be configured, see Train a Model with Weight Sparsity.

loop

Configures a TrainingLoop instance that specifies how many steps to train and validate for.

trainer:
init:
...
loop:
  num_steps: 1000
  eval_steps: 100
  eval_frequency: 100
...
...

To learn more about the loop argument, you can check out Training Loop.

checkpoint

Configures a Checkpoint instance that specifies how frequently the trainer should save checkpoints during training.

trainer:
init:
...
checkpoint:
  steps: 100
...
...

To learn more about the checkpoint argument, you can check out Checkpointing.

logging

Configures a Logging instance that configures the Python logger as well as specify how frequently the trainer should be writing logs.

trainer:
init:
...
logging:
  log_steps: 10
  log_level: INFO
...
...

In the above example, the Python logger is configured to allow info logs to be printed and to print logs at every 10 steps. To learn more about the logging argument, you can check out Trainer Logging.

callbacks

This key accepts a list of dictionaries, each of which specifies the configuration for some Callback class. Each dictionary is expected to have a single key specifying the name of the Callback The value at this key are passed in as keyword arguments to the subclass’s constructor.

trainer:
init:
...
callbacks:
- CheckLoss: {}
- ComputeNorm: {}
- RateProfiler: {}
- LogOptimizerParamGroup:
    keys:
    - lr
...
...

You can even include your own custom callbacks here. To learn more, you can read Customizing the Trainer with Callbacks

loggers

This key accepts a list of dictionaries, each of which specifies the configuration for some Logger class. Each dictionary is expected to have a single key specifying the name of the Logger. The value at this key are passed in as keyword arguments to the subclass’s constructor.

trainer:
init:
...
logger:
- ProgressLogger: {}
- TensorboardLogger: {}
...
...

You can even include your own custom loggers here. To learn more, you can read Loggers.

seed

This key accepts a single integer value to seed the random number generator. Setting this parameter will seed the PyTorch generator via a call to torch.manual_seed.

trainer:
init:
...
seed: 2024
...

fit

The fit key is used to specify the arguments to the Trainer method. The arguments are passed as key-value pairs, where the key is the argument name and the value is the argument value. Below are all of the accepted keys alongside YAML examples and their equivalent Python counterparts:

train_dataloader

This key is used to configure the training dataloader used to train the model. This value at this key is expected to be a dictionary containing at a minimum the data_processor key which specifies the name of the data processors to use. All other key-values in the dictionary are passed as argument to cerebras.pytorch.utils.data.DataLoader.


trainer:
init:
...
fit:
train_dataloader:
  data_processor: GptHDF5MapDataProcessor
  data_dir: "/path/to/train/data"
  batch_size: 64
...
...

val_dataloader

This key is used to configure the validation dataloader(s) used to validate the model. The dataloader configured here gets run for eval_steps every eval_frequency training steps. This value at this key is expected to be a dictionary or a list of dictionaries. Each dictionary is expected to contain at a minimum the data_processor key which specifies the name of the data processors to use. All other key-values in the dictionary are passed as argument to cerebras.pytorch.utils.data.DataLoader.

trainer:
init:
...
fit:
...
val_dataloader:
- data_processor: GptHDF5MapDataProcessor
  data_dir: "/path/to/validation/data"
  batch_size: 64
...
...

ckpt_path

Specifies the path to the checkpoint to load.

trainer:
init:
...
fit:
...
ckpt_path: /path/to/checkpoint
...

validate

The validate method. The arguments are passed as key-value pairs, where the key is the argument name and the value is the argument value. Below are all of the accepted keys alongside YAML examples and their equivalent Python counterparts:

val_dataloader

This key is used to configure the validation dataloader used to validate the model. This value at this key is expected to be a dictionary that contains at a minimum the data_processor key which specifies the name of the data processors to use. All other key-values in the dictionary are passed as argument to cerebras.pytorch.utils.data.DataLoader.

trainer:
init:
...
validate:
val_dataloader:
  data_processor: GptHDF5MapDataProcessor
  data_dir: "/path/to/validation/data"
  batch_size: 64
...
...

The validation dataloader is intended to be used alongside the validation metrics classes.

ckpt_path

Specifies the path to the checkpoint to load.


trainer:
init:
...
validate:
...
ckpt_path: /path/to/checkpoint
...

validate_all

The validate_all method. The arguments are passed as key-value pairs, where the key is the argument name and the value is the argument value. Below are all of the accepted keys alongside YAML examples and their equivalent Python counterparts:

val_dataloaders

This key is used to configure the validation dataloader(s) used to validate the model. This value at this key is expected to be a dictionary or a list of dictionaries. Each dictionary is expected to contain at a minimum the data_processor key which specifies the name of the data processors to use. All other key-values in the dictionary are passed as argument to cerebras.pytorch.utils.data.DataLoader.

trainer:
init:
...
validate_all:
val_dataloaders:
- data_processor: GptHDF5MapDataProcessor
  data_dir: "/path/to/validation/data1"
  batch_size: 64
  ...
- data_processor: GptHDF5MapDataProcessor
  data_dir: "/path/to/validation/data2"
  batch_size: 64
  ...
...
...

ckpt_paths

Specifies the paths to the checkpoints to load.

trainer:
init:
...
validate_all:
...
ckpt_paths:
- /path/to/checkpoint1
- /glob/path/to/checkpoint*
...

Globs are accepted as well.

All validation dataloaders are used to run validation for every checkpoint. So, effectively, validate_all is doing

from cerebras.modelzoo import Trainer

trainer = Trainer(...)
for ckpt_path in ckpt_paths:
    trainer.load_checkpoint(ckpt_path)
    for val_dataloader in val_dataloaders:
        trainer.validate(val_dataloader)

Legacy Specification

Versions 2.2 and earlier used the following YAML specification, now referred to as the legacy YAML specification:

model:
  ...
optimizer:
  ...
train_input:
  ...
eval_input:
  ...
runconfig:
  ...

We’ve updated it to make full use of the Trainer class. The training scripts in Model Zoo will detect if you’ve passed in a legacy configuration and will automatically invoke a converter tool before constructing and using the trainer. However, if you’d like to manually convert a legacy YAML or learn how to specify a specific parameter in the Trainer YAML, learn more in Convert Legacy to Trainer YAML.

What’s next?

To learn more about how you can use the Trainer in some core workflows, you can check out Pretraining with Upstream Validation. To learn more about how you can extend the capabilities of the Trainer class, you can check out Trainer Components.

Get Started

Setup and Installation

Models

Data Preparation

Model Configuration

Training and Eval

Configure and Run Jobs

Monitoring and Troubleshooting

Convert and Port

Advanced Usage

​Prerequisites

​Base Specification

​Config Parameters

​init

​device

​backend

​model_dir

​model

​optimizer

​schedulers

​precision

​sparsity

​loop

​checkpoint

​logging

​callbacks

​loggers

​seed

​fit

​train_dataloader

​val_dataloader

​ckpt_path

​validate

​val_dataloader

​ckpt_path

​validate_all

​val_dataloaders

​ckpt_paths

​Legacy Specification

​What’s next?

Prerequisites

Base Specification

Config Parameters

init

device

backend

model_dir

model

optimizer

schedulers

precision

sparsity

loop

checkpoint

logging

callbacks

loggers

seed

fit

train_dataloader

val_dataloader

ckpt_path

validate

val_dataloader

ckpt_path

validate_all

val_dataloaders

ckpt_paths

Legacy Specification

What’s next?