Kernel Autogeneration With Autogen
To optimize model performance on Cerebras hardware, we use a mix of pre-written and AutoGen (automatically generated) kernels.
Overview
While the core library of handwritten kernels addresses many standard operations, AutoGen kernels step in under certain conditions:
-
Missing Operations: If a model needs an operation that’s not in the handwritten library, AutoGen kernels are generated to support these operations.
-
Performance Optimization: AutoGen can create specialized, fused kernels for operations, even for those operations that already have existing handwritten implementations. This is especially useful for complex operations, such as loss functions, where AutoGen kernels can offer superior performance.
The Cerebras compiler automatically generates these custom kernels to enhance your model’s efficiency.
Autogenerating fused kernels for loss operations
AutoGen supports numerous PyTorch loss functions by creating fused kernels that replace or outperform handwritten versions. To use AutoGen for a PyTorch loss:
1. Import the PyTorch loss from the Model Zoo
2. Set use_autogen=True
. The default value of use_autogen
is True.
List of supported losses
-
BCELoss
-
CrossEntropyLoss
-
Loss.BCEWithLogitsLoss
-
Loss.GaussianNLLLoss
-
Loss.HingeEmbeddingLoss
-
Loss.HuberLoss
-
Loss.KLDivLoss
-
Loss.L1Loss
-
Loss.MarginRankingLoss
-
Loss.MSELoss
-
Loss.MultiLabelSoftMarginLoss
-
Loss.MultiMarginLoss
-
Loss.NLLLoss
-
Loss.PoissonNLLLoss
-
Loss.SmoothL1Loss
-
Loss.TripletMarginLoss
-
Loss.TripletMarginWithDistanceLoss Unsupported losses:
-
Loss.CosineEmbeddingLoss (Will compile to handwritten kernels)
-
Mish
Unsupported losses
All unsupported losses will still compile using a custom-written kernel.
- Loss.CosineEmbeddingLoss (Will compile to handwritten kernels)
Autogenerate kernels for customized losses
Creating custom losses may result in compilation failure. If this occurs, enable AutoGen for the custom loss by adding the @autogen_loss
decorator for the loss class.
Implementation notes
Release 2.2.0
In Release 2.2.0, the Autogen feature is temporarily unavailable.
Release 2.1.1
In Release 2.1.1, the automatic kernel generation feature (autogen) is temporarily unavailable. Make sure that in your runconfig file, you set autogen_policy:disabled
if you were previously using it.
Release 1.9.1
In Release 1.9.1, we enabled the following AutoGen capabilities
-
Support for PyTorch operations (e.g., nonlinearities)
-
Improving the performance of PyTorch losses through fused kernels
-
Support for user-defined loss functions
For any issues transitioning to a version with mandatory AutoGen, please contact Cerebras Support for guidance.