Model

On this page

Prerequisites
Configure the Model

The model is the main Module that all training and validation is run on. It’s required by all Trainer instances.

Prerequisites

Read the Trainer Overview and Trainer Configuration Overview for a basic overview of how to run Model Zoo models.

Configure the Model

Use the model argument to set the model you’d like to train or validate. When using YAML, pass all model subkeys as arguments to the model class. Your run script’s model_fn determines the model class. In Python, you can specify the model in two ways:

As a callable that takes no arguments and returns a Module
As a Module that the system uses directly

trainer:
  init:
    ...
    model:
      vocab_size: 1024
      max_position_embeddings: 1024
      ...
    ...
  ...

When passing a Module directly, initialize the model inside the Cerebras device context for optimal performance:

import cerebras.pytorch as cstorch
from cerebras.modelzoo import Trainer
from cerebras.modelzoo.models.nlp.gpt2.model import Gpt2Model

# Initialize the Cerebras backend for efficient processing.
backend = cstorch.backend("CSX")

# Use the backend's device context manager for initializing the model.
with backend.device:
    model = Gpt2Model(
        vocab_size=1024,
        max_position_embeddings=1024,
        ...,
    )

# Compile the model using the Cerebras backend for optimized execution.
trainer = Trainer(
    ...,
    backend=backend,
    model=model,
    ...,
)
...

This approach automatically moves model parameters to the Cerebras device, optimizing memory usage and improving initialization speed. For more information, see Efficient weight initialization.

Backend Loop

Get Started

Setup and Installation

Models

Data Preparation

Model Configuration

Training and Eval

Configure and Run Jobs

Monitoring and Troubleshooting

Convert and Port

Advanced Usage

Prerequisites

Configure the Model

Get Started

Setup and Installation

Models

Data Preparation

Model Configuration

Training and Eval

Configure and Run Jobs

Monitoring and Troubleshooting

Convert and Port

Advanced Usage

​Prerequisites

​Configure the Model

Prerequisites

Configure the Model