Backend

The cerebras.pytorch.backend is simply the configuration of the device and other settings used during a run. The device is simply what hardware the workflow will run on.

Prerequisites

Make sure to have read through Trainer Overview and Trainer Configuration Overview which provide the basic overview of how to run Model Zoo models. In this document, you will be using the tools and configurations outlined in those pages.

Configure the Device

Configuring the device used by the Trainer can be done by simply specifying one of "CSX", "CPU", or "GPU".

trainer:
  init:
    device: "CSX"
    ...
  ...

Setting device still creates a cerebras.pytorch.backend instance just with default settings. To configure anything about the backend, you must specify those parameters via the backend key instead.

Limitations

Once a device is set, any other Trainer instances must also use the same device type as well. You cannot mix device types. For example, a configuration like this:

# THIS CONFIGURATION IS INVALID
trainer:
- trainer:
    init:
      device: "CSX"
      ...
    ...
- trainer:
    init:
      device: "CPU"
      ...
    ...

will result in the following error:

RuntimeError: Cannot instantiate multiple backends. A backend with type CSX has already been instantiated.

Configure the Backend

Configuring the backend used by the Trainer can be done by creating a cerebras.pytorch.backend instance. The configuration is expected to be a dictionary whose keys will be used to construct a cerebras.pytorch.backend instance. In the Python script, construct a cerebras.pytorch.backend instance and pass it to the backend argument.

trainer:
  init:
    backend:
      backend_type: "CSX"
      cluster_config:
        num_csx: 4
        mount_dirs:
        - /path/to/dir1
        - /path/to/dir2
        ...
      ...
    ...
  ...

Limitations

Multiple backend instantiations with different devices is not supported. You will see this error:

RuntimeError: Cannot instantiate multiple backends. A backend with type CSX has already been instantiated.

That means that when you construct one or more Trainer instances, you must ensure you only instantiate backends of a single device type. However you can change other backend parameters between Trainer instances. The configuration is expected to be a dictionary whose keys will be used to construct a cerebras.pytorch.backend instance. In the Python script, construct a cerebras.pytorch.backend instance and pass it to the backend argument. For example:

trainer:
- trainer:
    init:
      backend:
        backend_type: "CSX"
        cluster_config:
          num_csx: 4
          mount_dirs:
          - /path/to/dir1
          - /path/to/dir2
          ...
        ...
      ...
    ...
- trainer:
    init:
      backend:
        backend_type: "CSX"
        cluster_config:
          num_csx: 2
          num_workers_per_csx: 1
          mount_dirs:
          - /path/to/dir1
          - /path/to/dir2
          ...
        ...
      ...
    ...

Mutual Exclusivity

The device and backend arguments are mutually exclusive. It is expected when initializing a Trainer to set one of them but not both. If both are set, you will see an error that looks like this:

ValueError: backend and device are mutually exclusive arguments of Trainer. Please only provide one or the other

Getting Started

Concepts

Model Zoo

CS Torch

Cluster Monitoring

Fundamentals

Support

Prerequisites

Configure the Device

Limitations

Configure the Backend

Limitations

Mutual Exclusivity

Further Reading

Getting Started

Concepts

Model Zoo

CS Torch

Cluster Monitoring

Fundamentals

Support

​Prerequisites

​Configure the Device

​Limitations

​Configure the Backend

​Limitations

​Mutual Exclusivity

​Further Reading

Prerequisites

Configure the Device

Limitations

Configure the Backend

Limitations

Mutual Exclusivity

Further Reading