Beginner: Automatic Microbatching

Set the Global Batch Size (GBS)

In your YAML or Python file, set num_csx and batch_size parameters:

trainer:
init:
  backend:
    backend_type: CSX
    cluster_config:
      num_csx: 2
  ...
  callbacks:
    - ScopedTrainFlags:
      csx.performance.micro_batch_size: 2
fit:
train_dataloader:
  batch_size: 12
  ...

Make sure batch_size is greater than or equal to num_csx. In this example, the global batch size of “12” will be split between two CS-X systems into a per-box batch size of “6”, and each CS-X will process this via microbatches of size “2”.

Choose Your Training Mode

Decide which mode fits your goal:

YAML Setting	Description
`micro_batch_size: auto`	Set this to find a reasonable MBS automatically. Compiles faster than `explore` but may select less optimal values. This is the default when `micro_batch_size` is not specified.
`micro_batch_size: explore`	Set this to search exhaustively for the best MBS for speed. This takes much longer to compile and works only in `compile_only` mode. Unlike `auto`, it evaluates all possible micro-batch sizes regardless of divisibility by `batch_size/num_csx`.
`micro_batch_size: <positive_int>`	Recommended when you know the optimal value (use `auto` or `explore` above to determine this), as it substantially reduces compilation time. The compiler may slightly adjust your specified value to ensure even workload distribution across CS-X systems, and will notify you if adjustments are made.
`micro_batch_size: none`	Disable microbatching and use the global `batch_size` parameter as the microbatch size. This may result in the model with the given batch size being too large to fit into device memory, in which case compilation will fail. If it does fit, however, the chosen batch size may be suboptimal for performance.

trainer:
 init:
   ...
   callbacks:
     - GlobalFlags:
         csx.performance.micro_batch_size: "explore"

If using explore and you have a specific range in mind for acceptable microbatch sizes, you can define a batch exploration range to limit the search space and get a set of recommended options more quickly. You can specify this range by providing either one or both of the bounds as follows:

trainer:
  init:
    ...
    callbacks:
      - GlobalFlags:
          csx.performance.micro_batch_size:
            explore:
                min: $min
                max: $max

Launch a Job

Launch a compile_only run:

cszoo fit <params_model.yaml> --compile_only

Set Optimal MBS

After your initial run (whether using auto or explore), you should:

Check what micro_batch_size the system selected (printed in logs).
Update your YAML to explicitly set that value for future runs.

The batch size recommended is specific to the current model configuration and may require adjustments if there are any changes to the model’s performance-affecting parameters. For instance, altering the model’s operation to evaluation mode or modifying the hidden size could impact performance. In such scenarios, it’s advisable to rerun explore or auto mode to ensure the batch size is optimized for the new configuration.

Model performance is a function of the microbatch size used on a Cerebras system. For example, for a given model, a microbatch of “2” will perform equally well regardless of the values used for num_csx or the global batch_size (as long as batch_size / num_csx is a multiple of the micro-batch size).
The microbatching feature will auto-disable for models that it does not support even if micro_batch_size is set. This includes models using batch normalization, or other kinds of non-linear computation over the batch dimension.
Since the examples above are limited to training, the microbatch size will be restored to its previous value after training is completed.

Effective Microbatching Examples

Below is a suggested list of micro-batch sizes that have demonstrated good performance, primarily with GPT-3 models. These sizes can also serve as useful estimates for other similar-sized GPT-style models, such as BLOOM and LLaMA.

Model Family	Model Size (Params)	Micro Batch Size (MBS)
GPT-3	1.3B	253
GPT-3	2.7B	198
GPT-3	6.7B	121
GPT-3	13B	99
GPT-3	20B	77
GPT-3	30B	69
GPT-3	39B	55
GPT-3	65B	55
GPT-3	82B	48
GPT-3	175B	35
T5	3B	256
T5	11B	520

Explore & Learn

​Effective Microbatching Examples

Effective Microbatching Examples