Model Description

Gemma 2 is a family of decoder-only transformer models developed by Google DeepMind, ranging from 2B to 27B parameters. Architecturally, Gemma 2 builds upon the Transformer backbone with several enhancements: it interleaves local sliding window and global attention layers, adopts grouped-query attention (GQA), and uses GeGLU activations with RMSNorm. The models support a context length of 8K and utilize a 256K-token multilingual tokenizer inherited from Gemini.

Gemma 2 models are well-suited for tasks involving instruction following, long-context understanding, multilingual reasoning, and coding.

Code Structure

The code for this model is located in the /gemma2 directory within ModelZoo. Here’s how it’s organized:

  • /configs: Contains YAML configuration files.
  • model.py: The implementation of the Gemma 2 model.

Our implementation of Gemma is built on top of our GPT-2 backbone. For more details, see gpt2_model.py.

Available Configurations

ConfigurationDescription
params_gemma2_9b_msl8k.yaml9B parameter Gemma 2 model with 8K MSL.
params_gemma2_9b_msl8k_swa_4k_sink_512.yamlVariant of the 9B model using 4K sliding window attention and 512 sink tokens.
params_gemma2_27b_msl8k.yaml27B parameter Gemma 2 model with 8K MSL.

Workflow

For example workflows using language models from the Cerebras Model Zoo, see our tutorials on pretraining and fine-tuning.

For a complete list of Cerebras ModelZoo CLI commands, see the command reference.

References