Trainer
and ways you can mitigate some of the adverse effects of using lower precision.
MixedPrecision
instance and pass it into the Trainer
’s precision
argument as follows.
cbfloat16
. The supported lower precision values include:
float16
bfloat16
cbfloat16
About CB16 Half-Precision
cbfloat16
. It’s a floating-point format with 6-bit exponent and 9-bit explicit mantissa. This allows for double the dynamic range of FP16.cbfloat16
data format is different from the bfloat16 Brain Floating Point format.loss_scaling_factor
argument to MixedPrecision
as follows:
loss_scaling_factor
accepts either some float
for static loss scaling, or the string "dynamic"
to facilitate dynamic loss scaling (see Dynamic loss scaling for more details).
inf
or NaN
gradients). To mitigate this, you can employ the use of gradient clipping.
To configure gradient clipping, you can pass in one of max_gradient_norm
or max_gradient_value
to MixedPrecision
as follows:
max_gradient_norm
and max_gradient_value
are mutually exclusive. So, only one may be passed in.precision_opt_level
argument to MixedPrecision
as follows:
[0, 1, 2]
. The precision optimization level is set to 1
by default.
Trainer
!
Trainer
class, you can check out: