Learn how to set the global batch size and choose a mode to find the optimal microbatch size.
Set the Global Batch Size (GBS)
num_csx
and batch_size
parameters:batch_size
is greater than or equal to num_csx
. In this example, the global batch size of “12” will be split between two CS-X systems into a per-box batch size of “6”, and each CS-X will process this via microbatches of size “2”.Choose Your Training Mode
YAML Setting | Description |
---|---|
micro_batch_size: auto | Set this to find a reasonable MBS automatically. Compiles faster than explore but may select less optimal values. This is the default when micro_batch_size is not specified. |
micro_batch_size: explore | Set this to search exhaustively for the best MBS for speed. This takes much longer to compile and works only in compile_only mode. Unlike auto , it evaluates all possible micro-batch sizes regardless of divisibility by batch_size/num_csx . |
micro_batch_size: <positive_int> | Recommended when you know the optimal value (use auto or explore above to determine this), as it substantially reduces compilation time. The compiler may slightly adjust your specified value to ensure even workload distribution across CS-X systems, and will notify you if adjustments are made. |
micro_batch_size: none | Disable microbatching and use the global batch_size parameter as the microbatch size. This may result in the model with the given batch size being too large to fit into device memory, in which case compilation will fail. If it does fit, however, the chosen batch size may be suboptimal for performance. |
explore
and you have a specific range in mind for acceptable microbatch sizes, you can define a batch exploration range to limit the search space and get a set of recommended options more quickly. You can specify this range by providing either one or both of the bounds as follows:Launch a Job
compile_only
run:cszoo fit <params_model.yaml> --compile_only
Set Optimal MBS
auto
or explore
), you should:micro_batch_size
the system selected (printed in logs).explore
or auto
mode to ensure the batch size is optimized for the new configuration.num_csx
or the global batch_size
(as long as batch_size / num_csx
is a multiple of the micro-batch size).micro_batch_size
is set. This includes models using batch normalization, or other kinds of non-linear computation over the batch dimension.Model Family | Model Size (Params) | Micro Batch Size (MBS) |
---|---|---|
GPT-3 | 1.3B | 253 |
GPT-3 | 2.7B | 198 |
GPT-3 | 6.7B | 121 |
GPT-3 | 13B | 99 |
GPT-3 | 20B | 77 |
GPT-3 | 30B | 69 |
GPT-3 | 39B | 55 |
GPT-3 | 65B | 55 |
GPT-3 | 82B | 48 |
GPT-3 | 175B | 35 |
T5 | 3B | 256 |
T5 | 11B | 520 |