Decoder-only transformer models by Mistral, using sliding window attention and grouped-query attention for fast, high-quality language generation.
/mistral
directory within ModelZoo. Here’s how it’s organized:
gpt2_model.py
.Configuration | Description |
---|---|
params_mistral_7B.yaml | 7B parameter Mistral model. |
params_mistral_7B_msl128k.yaml | 7B parameter Mistral model with 128K MSL. |
params_mistral_12b.yaml | 12B parameter Mistral model. |