Protein language model trained on UniRef50, using a masked language modeling objective to learn evolutionary and structural properties of proteins.
esm2
directory within the ModelZoo. It reuses shared training infrastructure and custom data processors optimized for protein sequence modeling.
configs/
: YAML configuration files for training various ESM-2 model sizes.model.py
: Top-level wrapper for initializing ESM-2 model instances and integrating with training.esm2_pretrain_models.py
: Core model architecture implementation.utils.py
: Helper utilities for config parsing and data formatting.Configuration | Description |
---|---|
params_esm2_t12_35M_UR50D.yaml | ESM-2 model with 12 layers and ~35M parameters. |
params_esm2_t33_650M_UR50D.yaml | ESM-2 model with 33 layers and ~650M parameters. |
params_esm2_t33_650M_UR50D_vsl.yaml | ESM-2 650M model with Variable Sequence Length (VSL) enabled for efficient training. |
params_esm2_t36_3B_UR50D.yaml | ESM-2 model with 36 layers and ~3B parameters. |
params_esm2_t48_15B_UR50D.yaml | ESM-2 model with 48 layers and ~15B parameters. |