If you set a micro_batch_size that:

  • Is valid (i.e., evenly divides the per-system batch size after rounding): it will be used as is.
  • Is not valid (not supported): the compiler will automatically override it to the nearest valid value and issue a warning.

The compiler ensures:

  • Approximately even distribution of the work across CS-X systems.
  • Automatic adjustments if micro_batch_size isn’t optimal or feasible.

Numeric Examples

The following examples demonstrate how the system determines if a micro_batch_size is valid and what happens if it isn’t. Values can be automatically overwritten or result in an error.

Use Case 1 - MBS is Valid

If you provide:

  • batch_size = 133
  • num_csx = 1
  • micro_batch_size = 34

The system implicitly derives the following:

  • Per-system batch size = Ceil(133/1) = 133
  • Valid NumMicroBatches = {1, 2, 3, …, 133}
  • Supported MBS values = {133, 133/2, 133/3, 133/4, …, 133/133}
    • After dividing & taking Ceil = {133, 67, 45, 34, 27, 23, 19, 17, 15, 14, 13, 12, …, 1}
  • NumMicroBatches = Ceil(133/34) = Ceil(3.912) = 4

Use Case 2 - MBS is Overwritten

If you provide:

  • batch_size = 673
  • num_csx = 2
  • micro_batch_size = 168

The system implicitly derives the following:

  • Per-system batch size = Ceil(673/2) = 337
  • Valid NumMicroBatches = {1, 2, 3, …, 337}
  • Supported MBS values = {337, 337/2, 337/3, 337/4, …, 337/337}
    • After dividing & taking ceil = {337, 169, 113, 85, 68, 57, 49, 43, 38, 34, 31, 29, 26, 25, 23, 22, 20, 19, 18, 17, …, 1}, 168 is not found in this list.

Since the MBS here is invalid, it’s overwritten to the nearest supported value, which is 169, and shows the following warning message:

INFO: The micro batch size is changed to 169 to allow approximately even distribution across boxes and gradient accumulation iterations

  • NumMicroBatches = Ceil(337/169) = Ceil(1.994) = 2

Use Case 3 - Invalid MBS Error

If you provide:

  • batch_size = 240
  • num_csx = 2
  • micro_batch_size = 121

The system implicitly derives the following:

  • Per-system batch size = Ceil(240/2) = 120
  • Valid NumMicroBatches = {1, 2, 3, …, 120}

In this case we can see that the given MBS is already larger than the per-system batch size, which is the largest valid value of NumMicroBatches, so you will see the following error message:

ERROR: <unknown>:0:error: Minimum microbatch size 121 must be smaller or equal to the per-box batch size 120 where the number of CSX boxes is 2