Overview

The ModelZoo CLI is a comprehensive command-line interface that serves as a single entry point for all ModelZoo-related tasks. This tool streamlines various machine learning workflows, from data preprocessing to model training and validation.

Commands

Below is a list of commands that can be used with the ModelZoo CLI tool. Expand each section to see examples and more information.
Example: cszoo fit params_model.yaml
Example: cszoo validate params_model.yaml
Example: cszoo validate_all params_model.yaml
Example: cszoo checkpoint convert --model gpt2 --src-fmt cs-auto --tgt-fmt hf --config workdir/params_gpt_tiny.yaml model_dir/checkpoint.mdl
Example: cszoo checkpoint convert-config --model gpt2 --src-fmt cs-auto --tgt-fmt hf workdir/params_gpt_tiny.yaml
Example: cszoo checkpoint list-converters
Example: cszoo checkpoint diff checkpoint_a.mdl checkpoint_b.mdl
Example: cszoo checkpoint info PATH
Example: cszoo checkpoint delete PATH
Example: cszoo checkpoint copy SRC_PATH DST_PATH
Example: cszoo checkpoint move SRC_PATH DST_PATH
Example: cszoo model list
Example: cszoo model info llama
Example: cszoo model describe llama
Example: cszoo model init_checkpoint <model_name>
Example: cszoo data_preprocess list
Example: cszoo data_preprocess pull summarization_preprocessing -o workdir
Example: cszoo data_preprocess run --config preprocessing.yaml
Example: cszoo data_processor list
Example: cszoo data_processor info GptHDF5DataProcessor
Example: cszoo data_processor describe GptHDF5DataProcessor
Example: cszoo data_processor benchmark params.yaml
Example: cszoo config pull llama2_7b -o workdir
Example: cszoo config validate params.yaml
Example: cszoo config convert_legacy old_config.yaml
Example: cszoo config stats params.yaml
Example: cszoo lm_eval workdir/params_gpt_tiny.yaml --tasks=winogrande --checkpoint_path=workdir/my_ckpt.mdl
Example: cszoo bigcode_eval workdir/params_gpt_tiny.yaml --tasks=mbpp --checkpoint_path=workdir/my_ckpt.mdl

Example Workflow: Pretraining a model using the ModelZoo CLI

This workflow guides you through the steps to pretrain a model using the Cerebras ModelZoo CLI. Follow these steps to set up your environment, preprocess data, and run the pretraining process.
Prerequisite: Before proceeding with the steps below, ensure that you have completed the setup and installation guide found here.
1

Create model directory

Create a directory to store all the files for this pretraining workflow and copy the necessary configuration files.
mkdir pretraining_tutorial
cp modelzoo/src/cerebras/modelzoo/tutorials/pretraining/* pretraining_tutorial
2

Preprocess the data

Preprocess the training and validation datasets using the provided configuration files.
cszoo data_preprocess run --config pretraining_tutorial/train_data_config.yaml
cszoo data_preprocess run --config pretraining_tutorial/valid_data_config.yaml
3

Run model

Run the pretraining process using the provided configuration.
cszoo fit pretraining_tutorial/model_config.yaml
4

Convert checkpoint to HuggingFace

Convert the trained model checkpoint into a HuggingFace-compatible format.
cszoo checkpoint convert \
  --model llama \
  --src-fmt cs-auto \
  --tgt-fmt hf \
  --config pretraining_tutorial/model_config.yaml \
  --output-dir pretraining_tutorial/to_hf \
  pretraining_tutorial/model/checkpoint_0.mdl

Getting Help

For detailed information about any command, use the --help flag:
cszoo --help
cszoo <command> --help

CSZoo Assistant

Need help? Our CSZoo Assistant is an LLM agent you can access from the command line with the assistant subcommand. Use it to:
  • Ask questions: cszoo assistant "what is the checkpoint converter?"
  • Perform actions: cszoo assistant "convert my checkpoint from huggingface to cerebras"
CSZoo Assistant will always ask your permission before running a command.
Access to the Cerebras Inference API is required and you’ll need to provide your API key with the following command:export CEREBRAS_API_KEY=<your api key>Don’t have an API key? Follow these instructions.
CSZoo Assistant is a beta feature and it may make mistakes. Always double-check its reasoning and be aware of the following limitations:
  • CSZoo Assistant can currently only access the help manuals found with cszoo ... -h.
  • There are currently no advanced context length management mechanisms in place. The assistant will error out if it overflows the context length.