Gemma 2
Decoder-only language models by Google DeepMind, using interleaved attention and GQA for high-quality performance at practical scale.
Model Description
Gemma 2 is a family of decoder-only transformer models developed by Google DeepMind, ranging from 2B to 27B parameters. Architecturally, Gemma 2 builds upon the Transformer backbone with several enhancements: it interleaves local sliding window and global attention layers, adopts grouped-query attention (GQA), and uses GeGLU activations with RMSNorm. The models support a context length of 8K and utilize a 256K-token multilingual tokenizer inherited from Gemini.
Gemma 2 models are well-suited for tasks involving instruction following, long-context understanding, multilingual reasoning, and coding.
Code Structure
The code for this model is located in the /gemma2
directory within ModelZoo. Here’s how it’s organized:
Our implementation of Gemma is built on top of our GPT-2 backbone. For more details, see gpt2_model.py
.
Available Configurations
Configuration | Description |
---|---|
params_gemma2_9b_msl8k.yaml | 9B parameter Gemma 2 model with 8K MSL. |
params_gemma2_9b_msl8k_swa_4k_sink_512.yaml | Variant of the 9B model using 4K sliding window attention and 512 sink tokens. |
params_gemma2_27b_msl8k.yaml | 27B parameter Gemma 2 model with 8K MSL. |
Workflow
For example workflows using language models from the Cerebras Model Zoo, see our tutorials on pretraining and fine-tuning.
For a complete list of Cerebras ModelZoo CLI commands, see the command reference.
References
- Gemma Team. (2024). Gemma 2: Improving Open Language Models at a Practical Size
- Ainslie, J. et al. (2023). GQA: Generalized Multi-Query Attention
- Hinton, G. et al. (2015). Distilling the Knowledge in a Neural Network