cerebras.pytorch.save
function that you can use in exactly the same way as torch.save
:
cerebras.pytorch.load
function that can also be used in exactly the same way as torch.load
:
DataExecutor
. The reason this is so, is to make training more performant.
For example, if the configuration was checkpoint_steps=100
, you are only allowed to fetch the weights to take a checkpoint every 100th step and at the very end on the last step.
To aid this, you can use the checkpoint_closure
decorator which is a step closure that checks that the current step is a checkpoint step before calling the function. In addition, using this decorator ensures that the weights are available to fetch from the server before they can be saved to the checkpoint file.
cerebras.pytorch.load
function will also work in CPU/GPU workflows.