The model is the main Module that all training and validation is run on. It’s required by all Trainer instances.

Prerequisites

Read the Trainer Overview and Trainer Configuration Overview for a basic overview of how to run Model Zoo models.

Configure the Model

Use the model argument to set the model you’d like to train or validate.

When using YAML, pass all model subkeys as arguments to the model class. Your run script’s model_fn determines the model class.

In Python, you can specify the model in two ways:

  • As a callable that takes no arguments and returns a Module
  • As a Module that the system uses directly
trainer:
  init:
    ...
    model:
      vocab_size: 1024
      max_position_embeddings: 1024
      ...
    ...
  ...

When passing a Module directly, initialize the model inside the Cerebras device context for optimal performance:

import cerebras.pytorch as cstorch
from cerebras.modelzoo import Trainer
from cerebras.modelzoo.models.nlp.gpt2.model import Gpt2Model

# Initialize the Cerebras backend for efficient processing.
backend = cstorch.backend("CSX")

# Use the backend's device context manager for initializing the model.
with backend.device:
    model = Gpt2Model(
        vocab_size=1024,
        max_position_embeddings=1024,
        ...,
    )

# Compile the model using the Cerebras backend for optimized execution.
trainer = Trainer(
    ...,
    backend=backend,
    model=model,
    ...,
)
...

This approach automatically moves model parameters to the Cerebras device, optimizing memory usage and improving initialization speed. For more information, see Efficient weight initialization.