Learn strategies for integrating sparsity into Cerebras models to optimize performance and computational efficiency across neural network architectures.
model.apply(sparsity)
, your model parameters are sparsified, enhancing training efficiency.
cstorch.compile
, ensuring all parameters are on the Cerebras device.
2. To exclude certain parameters from sparsity, set param.requires_dense = True
. If a parameter does not have this attribute, the algorithm assumes that it is False
.
optimizer.step()
is executed.
Executing optimizer.apply(sparsity)
transforms your optimizer into a sparse optimizer.
False
.
optimizer.step()
call. This automatic update feature can be deactivated if necessary:Group
class.
fc1.*
glob pattern, while employing the SET sparsity algorithm for parameters that match the fc2.*
glob pattern.
SparsityAlgorithm
.
update
which takes care of updating the sparsity patterns for all sparse parameters.
For algorithms that dynamically change the sparsity pattern, there is a convenient DynamicSparsityAlgorithm
class that you can inherit from that takes care of many of the implementation details required to facilitate dynamic sparsity.
DynamicSparsityAlgorithm
already implements update
, but it exposes a :py:new abstract method update_mask
that :py:must be overriden instead. update_mask
takes :py:in the existing sparsity pattern in the form of a mask tensor and must :py:return the new sparsity pattern in the form of a mask tensor as well.
See GMP
, SET
, and RigL
for examples of how to implement update_mask
.
In addition, there are many building blocks that are provided that can be used directly, inherited from, or composed to help build new DynamicSparsityAlgorithm
subclasses. See Customizing Sparsity & Reference for more details.
Once you’ve written your custom sparsity algorithm, as long as it’s available in the global scope, you can use it directly or even through a call to configure
by setting the algorithm
to be the name of your custom sparsity algorithm class. By extension, this means that you can use it in ModelZoo in a similar way by setting the algorithm
to be the name of your custom sparsity algorithm class in your params YAML file (see sparsity_via_yaml for more details).
torch.nn.Parameter
cannot directly accommodate a torch.sparse.Tensor
without specific adjustments. The torch.prune
utilities are convenient, but the asynchronous and precompiled nature of computation on the WSE requires a custom solution.
Similar to how torch.prune
handles its mask tensors, when the sparsity algorithm is applied to the model, every parameter that is sparsified has a mask tensor registered as a stateful buffer next to it in the module that owns the parameter.
For example, take the following simple model:
weight
and weight_mask
tensors collectively represent the sparsified weight
, showing how sparsity is represented within the model’s architecture.