Learn how to optimize performance by configuring the Trainer with automatic microbatching.
GlobalFlags
or ScopedTrainFlags
callback. Learn more about these callbacks in Performance Flags
We have two guides depending on your familiarity with microbatching. We recommend reading the rest of this guide before moving on to the beginner or advanced guides:
Trainer
.batch_size
isn’t divisible by num_csx
micro_batch_size
micro_batch_size
parameter in the YAML config file.
num_csx
batch_size
num_csx
systems or into micro batches. This parameter must be larger than num_csx
.per-system batch size
Ceil(⌈batch_size / num_csx⌉)
and represents the size of the batch used on each Cerebras system. This is calculated internally by the tool and no action is required.micro_batch_size
YAML Setting | Description |
---|---|
auto | Set this to find a reasonable MBS automatically. Compiles faster than explore but may select less optimal values. This is the default when micro_batch_size is not specified. |
explore | Set this to search exhaustively for the best MBS for speed. This takes much longer to compile and works only in compile_only mode. Unlike auto , it evaluates all possible micro-batch sizes regardless of divisibility by batch_size/num_csx . |
<positive_int> | Recommended when you know the optimal value (use auto or explore above to determine this), as it substantially reduces compilation time. The compiler may slightly adjust your specified value to ensure even workload distribution across CS-X systems, and will notify you if adjustments are made. |
none | Disable microbatching and use the global batch_size parameter as the microbatch size. This may result in the model with the given batch size being too large to fit into device memory, in which case compilation will fail. If it does fit, however, the chosen batch size may be suboptimal for performance. |
NumMicroBatches
NumMicroBatches = Ceil(per-system batch size / micro_batch_size)
This value helps determine which micro_batch_size
settings are valid. Since the smallest allowed MBS is 1, the maximum number of microbatches equals the per-system batch size. So, the valid range for NumMicroBatches
is:{1, 2, ..., per-system batch size}
To find all valid micro_batch_size
values, divide the per-system batch size by each number in this range and take the ceiling of the result. The resulting set of values are the supported MBS options.If your specified MBS is not in the supported MBS options set the Cerebras software stack will issue a warning message and will automatically override the given MBS with the closest supported value from the set.micro_batch_size
parameter instead of leaving it undefined.
micro_batch_size
to “explore” initiates an exhaustive search, potentially extending over several hours. However, the typical compile time for most GPT models is expected to be around one hour.