Model Description

StarCoder is a family of decoder-only transformer models developed by the BigCode initiative, optimized for high-quality code generation. The flagship model, StarCoder (15.5B), was trained on 1 trillion tokens, with an emphasis on multilingual support and strong performance in Python.

Architecturally, StarCoder builds on the transformer decoder backbone with several enhancements: it uses multi-query attention (MQA) for fast inference, supports fill-in-the-middle (FIM) generation, and extends context lengths to 8K tokens. Variants of StarCoder have been fine-tuned for specific domains such as SQL, OctoPack, and WizardCoder-style instruction following.

These models are well-suited for tasks such as code completion, documentation generation, and interactive programming support.

Code Structure

The code for this model is located in the /starcoder directory within ModelZoo. Here’s how it’s organized:

  • /configs: Contains YAML configuration files.
  • model.py: The implementation of the StarCoder model.

Our implementation of StarCoder is built on top of our GPT-2 backbone. For more details, see gpt2_model.py.

Available Configurations

ConfigurationDescription
params_starcoder_15b.yamlBase StarCoder model with 15.5B parameters.
params_octocoder_15b.yamlStarCoder variant fine-tuned with OctoPack-style data.
params_sqlcoder_15b.yamlStarCoder variant specialized for SQL generation.
params_wizardcoder_15b.yamlStarCoder variant tuned for instruction-style prompts.

Workflow

For example workflows using language models from the Cerebras Model Zoo, see our tutorials on pretraining and fine-tuning.

For a complete list of Cerebras ModelZoo CLI commands, see the command reference.

References