Gemma 2

Model Description
Code Structure
Available Configurations
Workflow
References

Model Description

Gemma 2 is a family of decoder-only transformer models developed by Google DeepMind, ranging from 2B to 27B parameters. Architecturally, Gemma 2 builds upon the Transformer backbone with several enhancements: it interleaves local sliding window and global attention layers, adopts grouped-query attention (GQA), and uses GeGLU activations with RMSNorm. The models support a context length of 8K and utilize a 256K-token multilingual tokenizer inherited from Gemini. Gemma 2 models are well-suited for tasks involving instruction following, long-context understanding, multilingual reasoning, and coding.

Code Structure

The code for this model is located in the /gemma2 directory within ModelZoo. Here’s how it’s organized:

/configs: Contains YAML configuration files.
model.py: The implementation of the Gemma 2 model.

Our implementation of Gemma is built on top of our GPT-2 backbone. For more details, see gpt2_model.py.

Available Configurations

Configuration	Description
`params_gemma2_9b_msl8k.yaml`	9B parameter Gemma 2 model with 8K MSL.
`params_gemma2_9b_msl8k_swa_4k_sink_512.yaml`	Variant of the 9B model using 4K sliding window attention and 512 sink tokens.
`params_gemma2_27b_msl8k.yaml`	27B parameter Gemma 2 model with 8K MSL.

Workflow

For example workflows using language models from the Cerebras Model Zoo, see our tutorials on pretraining and fine-tuning. For a complete list of Cerebras ModelZoo CLI commands, see the command reference.

References

Gemma Team. (2024). Gemma 2: Improving Open Language Models at a Practical Size
Ainslie, J. et al. (2023). GQA: Generalized Multi-Query Attention
Hinton, G. et al. (2015). Distilling the Knowledge in a Neural Network

Falcon GPT-3

⌘I

Get Started

Setup and Installation

Models

Data Preparation

Model Configuration

Training and Eval

Configure and Run Jobs

Monitoring and Troubleshooting

Convert and Port

Advanced Usage

Model Description

Code Structure

Available Configurations

Workflow

References

Get Started

Setup and Installation

Models

Data Preparation

Model Configuration

Training and Eval

Configure and Run Jobs

Monitoring and Troubleshooting

Convert and Port

Advanced Usage

​Model Description

​Code Structure

​Available Configurations

​Workflow

​References

Model Description

Code Structure

Available Configurations

Workflow

References