Follow this guide to pretrain your first model on a Cerebras system.
Create Model Directory & Copy Configs
cp
here to copy configs specifically designed for this tutorial. For general use with Model Zoo models, we recommend using cszoo config pull
. See the CLI command reference for details.Inspect Configs
Model Config
Evaluation Config
Data Config
Preprocess Data
pretraining_tutorial/train_data/
and pretraining_tutorial/valid_data/
(see the output_dir
parameter in your data configs).KeyError: 'tags'
This issue occurs due to an outdated version of the huggingface_hub
package. To resolve it, update the package by running:pip install --upgrade huggingface_hub==0.26.1
Inspect Preprocessed Data
http://172.31.48.239:5000.
Copy and paste this into your browser to launch TokenFlow, a tool for interactively visualizing whether loss and attention masks were applied correctly:Train and Evaluate Model
train_dataloader.data_dir
and val_dataloader.data_dir
in your model config to use the absolute paths of your preprocessed data:pretraining_tutorial/model
folder (see the model_dir
parameter in your model config). These include:Port Model to Hugging Face
pretraining_tutorial/to_hf
.Validate Checkpoint and Configs