Model Is Too Large To Fit In Memory
Observed Error
Causes and Possible Solutions
The memory requirements of your model are too large to fit on the device. Potential workarounds include:
-
On transformer models, please compile again with the batch size set to 1 using one CS-2 system to determine if the specified maximum sequence length is feasible.
-
You can try a smaller batch size per device or enable batch tiling (only on transformer models) by setting the
micro_batch_size
parameter in thetrain_input
oreval_input
section of your model’s yaml file (see working_with_microbatches). * If you ran with batch tiling with a specificmicro_batch_size
value, you can try compiling with a decreasedmicro_batch_size
. The Using “explore” to Search for a Near-Optimal Microbatch Size flow can recommend performant micro batch sizes that will fit in memory. -
On CNN models where batch tiling isn’t supported, try manually decreasing the batch size and/or the image/volume size.
Note
For more information on working with batch tiling and selecting performant micro_batch_size
values, visit working_with_microbatches
Note
The batch_size
parameter set on the yaml configuration is the global batch size. This means that the batch size per CS-2 system is computed as the global batch size divided by the number of CS-2s used.