Beginner: Automatic Microbatching
Learn how to set the global batch size and choose a mode to find the optimal microbatch size.
Set the Global Batch Size (GBS)
In your YAML or Python file, set num_csx
and batch_size
parameters:
Make sure batch_size
is greater than or equal to num_csx
. In this example, the global batch size of “12” will be split between two CS-X systems into a per-box batch size of “6”, and each CS-X will process this via microbatches of size “2”.
Choose Your Training Mode
Decide which mode fits your goal:
YAML Setting | Description |
---|---|
micro_batch_size: auto | Set this to find a reasonable MBS automatically. Compiles faster than explore but may select less optimal values. This is the default when micro_batch_size is not specified. |
micro_batch_size: explore | Set this to search exhaustively for the best MBS for speed. This takes much longer to compile and works only in compile_only mode. Unlike auto , it evaluates all possible micro-batch sizes regardless of divisibility by batch_size/num_csx . |
micro_batch_size: <positive_int> | Recommended when you know the optimal value (use auto or explore above to determine this), as it substantially reduces compilation time. The compiler may slightly adjust your specified value to ensure even workload distribution across CS-X systems, and will notify you if adjustments are made. |
micro_batch_size: none | Disable microbatching and use the global batch_size parameter as the microbatch size. This may result in the model with the given batch size being too large to fit into device memory, in which case compilation will fail. If it does fit, however, the chosen batch size may be suboptimal for performance. |
If using explore
and you have a specific range in mind for acceptable microbatch sizes, you can define a batch exploration range to limit the search space and get a set of recommended options more quickly. You can specify this range by providing either one or both of the bounds as follows:
Launch a Job
Launch a compile_only
run:
cszoo fit <params_model.yaml> --compile_only
Set Optimal MBS
After your initial run (whether using auto
or explore
), you should:
- Check what
micro_batch_size
the system selected (printed in logs). - Update your YAML to explicitly set that value for future runs.
The batch size recommended is specific to the current model configuration and may require adjustments if there are any changes to the model’s performance-affecting parameters. For instance, altering the model’s operation to evaluation mode or modifying the hidden size could impact performance. In such scenarios, it’s advisable to rerun explore
or auto
mode to ensure the batch size is optimized for the new configuration.
- Model performance is a function of the microbatch size used on a Cerebras system. For example, for a given model, a microbatch of “2” will perform equally well regardless of the values used for
num_csx
or the globalbatch_size
(as long asbatch_size / num_csx
is a multiple of the micro-batch size). - The microbatching feature will auto-disable for models that it does not support even if
micro_batch_size
is set. This includes models using batch normalization, or other kinds of non-linear computation over the batch dimension. - Since the examples above are limited to training, the microbatch size will be restored to its previous value after training is completed.
Effective Microbatching Examples
Below is a suggested list of micro-batch sizes that have demonstrated good performance, primarily with GPT-3 models. These sizes can also serve as useful estimates for other similar-sized GPT-style models, such as BLOOM and LLaMA.
Model Family | Model Size (Params) | Micro Batch Size (MBS) |
---|---|---|
GPT-3 | 1.3B | 253 |
GPT-3 | 2.7B | 198 |
GPT-3 | 6.7B | 121 |
GPT-3 | 13B | 99 |
GPT-3 | 20B | 77 |
GPT-3 | 30B | 69 |
GPT-3 | 39B | 55 |
GPT-3 | 65B | 55 |
GPT-3 | 82B | 48 |
GPT-3 | 175B | 35 |
T5 | 3B | 256 |
T5 | 11B | 520 |