Model Description
The LLaMA family is a series of decoder-only transformer models designed for efficient, high-performance language modeling. Architecturally similar to GPT-2, the original LLaMA model uses RMSNorm instead of LayerNorm, SwiGLU activations, and rotary positional embeddings. LLaMA-2 improves on this with a larger training corpus, doubled context length, and grouped-query attention in its largest model. Code LLaMA specializes in programming tasks through continued pretraining on code-heavy data. LLaMA-3 introduces a more efficient 128K-token tokenizer, expands context to 8K tokens, and adopts grouped-query attention across all sizes. These models excel at text generation, summarization, reasoning, coding, and instruction following.Code Structure
The code for this model is located in the/llama
directory within ModelZoo. Here’s how it’s organized:
Our implementation of LLaMA is built on top of our GPT-2 implementation. For more details, see
gpt2_model.py
.Available Configurations
LLaMa 3
LLaMa 3
Configuration | Description |
---|---|
params_llama3p1_70b_msl_128k.yaml | A 70B parameter model with a maximum sequence length of 128K, configured as described in the LLaMa 3.1 blog. |
params_llama3p1_70b_msl_8k.yaml | A 70B parameter model with a maximum sequence length of 8K, configured as described in the LLaMa 3.1 blog. |
params_llama3p1_8b_msl_128k.yaml | A 8B parameter model with a maximum sequence length of 128K, configured as described in the LLaMa 3.1 blog. |
params_llama3p1_8b_msl_32k_swa_8k_sink_512.yaml | A 8B parameter model with a maximum sequence length of 32K, SWA starting at 8K, and sink tokens set to 512. Configured as described in the LLaMa 3.1 blog. |
params_llama3p1_8b_msl_8k.yaml | A 8B parameter model with a maximum sequence length of 8K, configured as described in the LLaMa 3.1 blog. |
LLaMa-2
LLaMa-2
Configuration | Description |
---|---|
params_llama2_7b.yaml | A 7B parameter model configured as described in the LLaMa-2 paper. |
params_llama2_13b.yaml | A 13B parameter model configured as described in the LLaMa-2 paper. |
params_llama2_70b.yaml | A 70B parameter model configured as described in the LLaMa-2 paper. |
Code LLaMa
Code LLaMa
Configuration | Description |
---|---|
params_code_llama_7b.yaml | A 7B parameter model configured as described in the Code LLaMa paper. |
params_code_llama_70b.yaml | A 70B parameter model configured as described in the Code LLaMa paper. |
WizardLM
WizardLM
Configuration | Description |
---|---|
params_wizardlm_13b.yaml | A 13B parameter model configured as described in the WizardLM paper. |
Workflow
For example workflows using language models from the Cerebras Model Zoo, see our tutorials on pretraining and fine-tuning. For a complete list of Cerebras ModelZoo CLI commands, see the command reference.References
- Radford, A. et al. (2019). Language Models are Unsupervised Multitask Learners.
- Touvron, Hugo, et al. (2023). Llama: Open and efficient foundation language models
- Touvron, Hugo, et al. (2023). Llama 2: Open Foundation and Fine-Tuned Chat Models
- Rozière, Baptiste, et al. (2023). Code Llama: Open Foundation Models for Code
- Meta AI (2024). Build the future of AI with Meta Llama 3