Learn how to convert existing PyTorch models to run on Cerebras systems.
data.py
, we’re going to define a function named get_random_dataloader
:
params.yaml
file.
params.yaml
, we can specify several important parameters that will influence the behavior of our synthetic data loader:
num_examples
: This parameter sets how many random images and labels the data loader should generate.
batch_size
: This defines how many examples will be included in a single batch during the training or evaluation of the model.
seed
: A seed for the random number generator ensures the reproducibility of our experiments by generating the same sequence of random images and labels for a given seed value.
image_size
: This specifies the dimensions of the generated random images.
num_classes
: This determines how many different classes the labels can take, which is crucial for classification tasks.
params.yaml
file, we can tailor the behavior of the get_random_dataloader
function to meet our specific experimental needs, allowing for a flexible and dynamic approach to evaluating the network’s performance under different conditions.
model.py
, change the fix number of classes to a parameter in the params.yaml
file:
configs/params.yaml
, add the additional fields used in the dataloader and model definition.
run
function. This approach allows you to utilize an established structure, streamlining the development process for your custom models and preprocessing routines.
All PyTorch models in the Cerebras Model Zoo share a standard framework that facilitates running them on the Cerebras Systems (CS) platform or other hardware types like CPUs/GPUs. This framework handles the modifications required to compile and execute a model on a Cerebras cluster. It offers a unified training and evaluation interface, allowing users to integrate their models and data preprocessing scripts seamlessly. With this setup, users don’t need to make detailed code adjustments for compatibility with Cerebras systems.
run
function effectively, ensure that the Cerebras Model Zoo repository, compatible with your target Cerebras cluster’s release, is installed. You can import the run function with the following code snippet:
run
function simplifies and organizes various aspects of your model’s workflow, including its implementation, data loading processes, hyperparameter settings, and overall execution. To effectively use the run
function, you should have:
run
function requires a callable class or function that takes as input a dictionary of params and returns a torch.nn.Module
whose forward
implementation returns a loss tensor.
For example, let’s implement FC_MNIST parametrized by the depth and the hidden size of the network. Let’s assume that the input size is 784 and the last output dimension is 10. We use ReLU
as non linearity, and a negative log likelihood loss.
In model.py
:
torch.nn.Module
object defined in the run
function includes both the inputs and the labels to compute the loss. It is up to the model to extract the inputs and labels from the batch before using them.
torch.utils.data.DataLoader
. When running training, the train_data_fn
must be provided. When running evaluation, the eval_data_fn
must be provided.
For example, to implement FC_MNIST, we create two different functions for training and evaluation. We use torchvision.datasets
functionality to download MNIST dataset. Each of these functions returns a torch.utils.data.DataLoader
.
In data.py
:
run
Functionrun
function are callables that take a dictionary as input, called params
. params
is a dictionary containing all of the model and data parameters specified by the params YAML file of the model.
Parameter | Type | Notes |
---|---|---|
model_fn | Callable[[dict], torch.nn.Module] | Required. A callable that takes in a dictionary of parameters. Returns a torch.nn.Module . |
train_data_fn | Callable[[dict], torch.utils.data.DataLoader] | Required during training run. |
eval_data_fn | Callable[[dict], torch.utils.data.DataLoader] | Required during evaluation run. |
default_params_fn | Callable[[dict], Optional[dict]] | Optional. A callable that takes in a dictionary of parameters. Sets default parameters. |
run.py
:
run
function has an optional input parameter called default_params_fn
. This parameter modifies the dictionary of the params YAML file, adding default values of unspecified params.
Setting up a default_params_fn
could be beneficial if the user is planning multiple experiments in which only a small subset of the params YAML file changes. The default_params_fn
sets up the values shared in all of the experiments. The user can create different configuration YAML files to only address the changes between experiments.
The default_params_fn
should be a callable that takes in the params
dictionary and returns a new dictionary. If the default_params_fn
is omitted, the params
dictionary will be used as is.
run
function requires a separate params YAML. This file is specified during execution with the flag --params
in the command line.
For example, this is the params.yaml
file for the FC_MNIST implementation. You can see both the new YAML specification (since release 2.3.0) and the legacy YAML specification.
run
function inside the script run.py
. Therefore, once you have ported your model to use the run
function, you can follow the steps in Launch your job section to launch your training or evaluation job.
CBMetric
. Metrics already defined in the Model Zoo git repository can be imported as:
torch.nn.Module
class. This is automatically done when the CBMetric
object is constructed. That is, to register a metric to a torch.nn.Module
class, construct the metric object in the torch.nn.Module
class’ constructor.
SummaryWriter
.