Set Up the Cerebras Environment

To set up your Cerebras environment for the Cerebras Model Zoo, follow the instructions provided in the ../Getting-started/Setup-installation page.

Running an Existing Model

To execute an existing model from the Cerebras Model Zoo, follow these steps:

1. Use the CLI to query the registry and display the supported models:

cszoo model list

2. Find the model implementation. For example, to locate the path for GPT-2:

cszoo model describe gpt2 --field path

For example, to locate the Model Zoo path for GPT-2:

cszoo model describe gpt2

╒════════════════╤═════════════════════════════════════════════════════════════════════════════════════════╕
│ Name           │ gpt2                                                                                    │
├────────────────┼─────────────────────────────────────────────────────────────────────────────────────────┤
│ Path           │ <modelzoo path>/modelzoo/models/nlp/gpt2                                                │
├────────────────┼─────────────────────────────────────────────────────────────────────────────────────────┤
│ Configs        │ gpt2_medium_reference                                                                   │
│                │ gpt2_small                                                                              │
│                │ gpt2_tiny                                                                               │
│                │ gpt2_medium_lora_a10                                                                    │
│                │ gpt2_small_reference                                                                    │
│                │ gpt2_large_lora                                                                         │
│                │ gpt2_tiny_synthetic                                                                     │
│                │ gpt2_small_bs1024                                                                       │
│                │ gpt2_medium_lora                                                                        │
│                │ gpt2_large_reference                                                                    │
├────────────────┼─────────────────────────────────────────────────────────────────────────────────────────┤
│ Dataprocessors │ Gpt2SyntheticDataProcessor                                                              │
│                │ GptTextDataProcessor                                                                    │
│                │ DummyDataProcessor                                                                      │
│                │ DummyIterableDataProcessor                                                              │
│                │ GptHDF5DataProcessor                                                                    │
│                │ GptHDF5MapDataProcessor                                                                 │
│                │ HuggingFaceDataProcessorEli5                                                            │
│                │ HuggingFaceIterableDataProcessorEli5                                                    │
╘════════════════╧═════════════════════════════════════════════════════════════════════════════════════════╛

3. Determine which YAML file to use for the model’s parameters. The YAML configurations are located in the configs directory within the model’s folder. For example, GPT-2’s YAML files can be found at:

<modelzoo path>/modelzoo/models/nlp/gpt2/configs/

4. Execute the run.py script, supplying the appropriate YAML file as an argument.

python /path/to/run.py --params /path/to/config.yaml 

Editing Configurations

If you need to modify existing configurations, ensure you have cloned the Model Zoo repository for write access to the YAML files. If you want to modify how a run is configured for a specific Model Zoo model, make sure you have first cloned the Model Zoo repository for write access to the YAML files. All reference configuration files in Model Zoo are located in the configs/ directory.

Querying Additional Components

To see which losses and dataloaders are provided in the reference examples, you can query the registry using:

python <modelzoo path>/modelzoo/common/registry_cli.py --list_losses
python <modelzoo path>/modelzoo/common/registry_cli.py --list_datasetprocessor

Using Config Classes with an Existing Model

Every model within the Model Zoo is equipped with a corresponding Config class. When a Config class is associated with a model, the configuration is automatically validated in the backend, necessitating no additional actions from the user. For a deeper understanding of Config classes, you can explore further details in Model Zoo Config Classes.

Creating and Registering a New Dataloader

To create a new dataloader, follow the instructions here. This will help you ensure compatibility with the Cerebras system. Next, we’ll cover how to register your new dataloader.

Registering Your New Dataloader

For the system to recognize and utilize your new dataloader, you need to register it within the Model Zoo’s registry framework.

Follow these steps to register your dataloader:

1. Create an __init__.py file

Ensure that there is an __init__.py file in the directory where your dataloader’s code resides. This file makes the directory a Python package, allowing its contents to be imported elsewhere.

2. Add registration code to __init__.py

Within the __init__.py file, include the following code to register your dataloader:

import os
from cerebras.modelzoo.common.registry import registry

# Obtain the current directory path where __init__.py is located.
current_path = os.path.dirname(os.path.realpath(__file__))

# Register the current path as a dataloader path.
registry.register_paths("dataloader_path", current_path)

This code snippet accomplishes the following:

  • It imports the necessary modules, including the registry from the Model Zoo’s common utilities.

  • It determines the path of the current directory (current_path) where the dataloader is located.

  • It registers this path with the registry under the “dataloader_path” category, allowing the system to detect and utilize your new dataloader.

By following these steps, you ensure that your new dataloader is properly integrated into the Model Zoo’s infrastructure, ready to be used in your machine learning workflows.

Adding and Registering a New Loss Function

To enhance the Model Zoo with your custom loss function, follow these steps to integrate and register it within the framework:

1. Implementation

Implement your new loss function and place the code in the <modelzoo path>/modelzoo/losses directory. This location is where the Model Zoo looks for loss function implementations.

2. Registering the loss function

To make your new loss function recognizable and usable within the Model Zoo, you need to register it:

  • Create or modify an __init__.py file

Ensure that an __init__.py file exists in the directory where your loss function code is located. If it doesn’t exist, create one. If it does, you’ll be editing it.

  • Add registration code

In the __init__.py file, include the following code to register your loss function:

import os
from cerebras.modelzoo.common.registry import registry

# Obtain the current directory path where __init__.py is located.
current_path = os.path.dirname(os.path.realpath(__file__))

# Register the current path as a loss function path.
registry.register_paths("loss_path", current_path)

This code accomplishes the following tasks:

  • It imports the necessary modules, including the registry from the Model Zoo’s common utilities.

  • It determines the path of the current directory (current_path) where the loss function is located.

  • It registers this path with the registry under the “loss_path” category, enabling the system to detect and use your new loss function.

By following these instructions, your custom loss function becomes an integrated part of the Model Zoo, ready to be utilized in various training workflows.

Creating and Registering a New Model

To develop and integrate a new model with the Cerebras Model Zoo, see the documentation here.

Registering Your New Model

To ensure the Model Zoo recognizes your new model:

1. Create or update an __init__.py file

Ensure there’s an __init__.py file in the directory where your model.py is located. If the file isn’t there, create one.

2. Add registration code

In the __init__.py file, insert the following code to register your model:

import os
from cerebras.modelzoo.common.registry import registry

# Get the current directory path where __init__.py is located.
current_path = os.path.dirname(os.path.realpath(__file__))

# Register this path as a model path.
registry.register_paths("model_path", current_path)

Ensure the model’s name matches the directory name containing model.py. For example, if your model is named “tinybert”, its code should reside in a directory named “tinybert/”.

Evaluating Your Model

To effectively evaluate your model during and after training, follow these guides:

Evaluating During Training

For insights on assessing your model’s performance throughout the training process, visit the run-model/eval guide in the Cerebras Developer Documentation. This resource provides comprehensive information on the steps and settings required to evaluate your model during training on the Wafer-Scale Cluster (WSC).

Using EleutherAI’s Evaluation Harness

If you’re working with Large Language Models (LLMs) within the Model Zoo, you might want to leverage EleutherAI’s Evaluation Harness (EEH) for a more in-depth evaluation. Our guide on downstream valudation using EEH offers detailed instructions on how to prepare your data and set up the EEH for evaluating LLMs. This tool provides a structured approach to assessing model performance across various benchmarks and tasks, facilitating a comprehensive evaluation of your LLM.

By following these guidelines, you can gain valuable insights into your model’s effectiveness, helping you make informed decisions for further model refinement and deployment.

Adding a New Dataset

To incorporate a new dataset for your model within the Cerebras Model Zoo, you’ll need to ensure it’s specified correctly in the model’s configuration and, for certain models like language models, converted into the appropriate format.

Specifying the Dataset in the YAML Configuration

1. Locate the YAML file

Identify the YAML configuration file associated with your model.

2. Update data_dir

Within the YAML file, under train_input or eval_input sections (depending on whether the dataset is for training or evaluation), specify the path to your dataset using the data_dir entry.

Preparing Datasets for Language Models

Language models within the Cerebras ecosystem often require datasets in HDF5 format. If your dataset isn’t already in this format, follow these conversion steps:

1. PyTorch dataset to HDF5

If your dataset is in a PyTorch format, convert it to HDF5 by following the guidelines provided in prepare-data/hdf5_preprocessing guide. This documentation offers a step-by-step process for the conversion.

2. Raw data to HDF5

For raw data conversion to HDF5, particularly for GPT-style models, refer to the Chunk preprocessing guide. This resource outlines the necessary steps to preprocess and convert your data, ensuring it’s in the right format for model consumption.

By following these procedures, you can successfully add and utilize new datasets with your models in the Cerebras Model Zoo, enhancing the versatility and applicability of your machine learning projects.

Utilizing Checkpoints

The Cerebras Model Zoo includes a “Checkpoint and Config Converter” tool, designed to facilitate the conversion of model implementations between the Model Zoo and other frameworks or repositories. This tool is particularly useful for migrating models into the Model Zoo environment or exporting them for use in different settings. It also allows you to convert checkpoints created through running on previous Cerebras software releases to checkpoints compatible with new software releases.

Checkpoint conversion: Cerebras and HuggingFace, software version updates

To learn more about how to use this tool for converting checkpoints and model configurations, click here. This resource provides detailed instructions on using the converter, ensuring a smooth transition between different coding environments.

Saving and Loading Checkpoints

Proper checkpoint management is crucial for efficiently training and evaluating models. For guidelines on saving and loading checkpoints within the Cerebras environment, consult the checkpointing documentation. This section offers comprehensive insights into checkpoint handling, including saving states during training and loading them for resuming training or evaluation.

By leveraging these resources, you can effectively manage model checkpoints in the Cerebras ModelZoo, enhancing your model development and experimentation workflows.