Current Release Highlights
Release 2.4.0
We are thrilled to announce Release 2.4.0, delivering substantial performance improvements and expanded capabilities that enhance machine learning model development workflow. This release introduces support for new models, improved mixture of expert capabilities, faster performance, and workflow updates that simplify and streamline the process of training large models.
Improvements to Model Zoo
- New Model Support: The latest additions to Model Zoo include Llama 3.3 (70B), Llama 3.2 (1B and 3B), and Mistral NeMO (12B) models.
Note: Please see this guide for instructions on how to conduct data preprocessing for Llama 3.3 70B.
-
Extended Max Sequence Length (MSL) Support: We now support MSL up to 128K tokens for training, fine-tuning, and evaluation tasks. 128K MSL is supported for the following models:
-
Llama 3.1 405B (Contact your account representative for access.)
-
Model Zoo CLI: Release 2.4.0 introduces a new command-line interface that centralizes all modeling tasks into a single, intuitive tool, allowing users to easily access scripts, utilities, and configuration files for core workflows including data preprocessing, pretraining, fine-tuning, checkpoint conversion, and more.
-
CSZoo Assistant: Introduced a command-line LLM agent that leverages Cerebras Inference to answer questions and execute actions in natural language. Users no longer need to memorize CLI commands or internal workflows—simply ask if features are supported, how to accomplish specific tasks, or request automated command execution for a streamlined and intuitive experience.
-
Config Classes for Streamlined Configuration Management: Introduced Pydantic-based config classes that provide structured, validated, and immutable schemas for model, data, and training parameters. This approach simplifies customization, ensures data integrity, and enables easier experimentation without requiring deep internal code changes.
-
Streamlined Model Zoo Registry: Optimized the ModelZoo registry for faster loading, improving startup times and user experience. Users can now register their custom models seamlessly and utilize the CLI to manage them effectively.
-
Enhanced Evaluation Framework: Model Zoo now supports EleutherAI’s LM Evaluation Harness (v0.4.5) and the latest BigCode evaluation harness, enabling users to run multiple generative and non-generative evaluation tasks within a single callback.
Improvements in Data Preprocessing
-
Expanded Preprocessing Options: Introduced both inline and offline preprocessing modes with efficient full-data shuffling, along with multimodal inline preprocessing to support complex, mixed-media datasets.
-
Improved Data Handling: Enhanced file skipping logic and introduced a truncate option (instead of skipping entirely), ensuring users retain maximum usable data. Additionally, a list of skipped files can now be saved for easier review and troubleshooting.
-
TokenFlow Enhancements: Introduced support for Masked Language Modeling (MLM), enhanced the handling of special characters, and refined the user interface to streamline text preprocessing workflows.
-
Advanced Text Pretraining Features: Included semantic region support for text-only pretraining and integrated embedding training data (DPR) for more sophisticated training regimes.
-
Data Preprocessing Performance Optimization: Data preprocessing operations now execute up to 95% faster due to optimized file handling and memory management. These improvements span across various data types and sizes, with processing time reductions of 30-95% across common operations. Users working with both text and multimodal data will experience significantly reduced processing times.
Enhanced Mixture of Experts Capabilities
-
Configurable Expert Selection and Weighting: Users can now choose both the routing algorithm (“hash” or “learned”) and the nonlinearity used for expert selection (Softmax, Sinkhorn, or Sigmoid). The nonlinearity for weighting expert outputs (Softmax or Sigmoid) is also independently configurable, providing greater flexibility and control over the expert routing process.
-
Improved Router Regularization: The router regularization mode can now be toggled between off and load balancing, offering clearer control over distribution of routing choices.
-
Null Expert Bias: Introduced a
null_expert_bias
parameter that represents the model’s uncertainty or “none of the above” option when routing. By including a null expert probability in the weighting calculation, gradient flow back to the router is improved, leading to improved loss, especially in scenarios where only the top single expert (top_k=1
) is selected. Users can continue to choose between normalizing expert weights into a probability distribution or simply using the raw router scores as attention-like weights. The added null expert probability integrates seamlessly with both approaches. -
Shared Experts: Introduced the ability to designate certain experts as “shared experts” that are always selected for every token, independent of the routing logic. These shared experts are always activated and help capture common knowledge across different contexts. This concept is inspired by DeepSeekMOE.
And More!
-
Expanded Cluster Management: Enhanced support for large-scale deployments now allows operation of clusters with hundreds of nodes, while new upgrade capabilities minimize downtime during maintenance, allowing organizations to maximize both the scale and availability of their compute resources.
-
CS-3+ Performance Upgrade: Enhanced power capabilities on the WSE-3 chip deliver a 1.9x performance improvement over CS-2 systems with linear scaling. This boost benefits all supported models—ranging from 2.7B to 180B parameters—across diverse architectures, vocabularies, context lengths (2k to 128k), and specialized variants like MoE and vLLMs.
-
Trillion-Parameter Model Support: Organizations can now train dense language models at the trillion-parameter scale, reliably running hundreds of iterations on larger clusters and dozens on smaller ones. This milestone maintains training stability and checkpoint functionality, unlocking research and development at unprecedented model sizes. For organizations aiming to train models at the trillion-parameter scale, contact your account representative to discuss the necessary requirements.
-
New Docs Platform: With the release of 2.4.0 we have migrated to a new documentation platform with an improved user interface and AI search feature. Users can now ask questions directly within the search bar and receive LLM-generated, documentation-grounded answers, delivering an intuitive and more interactive experience.