Text-to-text transformer model trained on the C4 dataset using a denoising objective, capable of performing a wide range of NLP tasks in a unified format.
t5
directory and reuses generic components for interfacing with training scripts and configuration systems.
configs/
: YAML configuration files specifying training and model hyperparameters.model.py
: Wrapper for initializing and interfacing with the T5 model.t5_model.py
: Main model implementation including encoder-decoder structure and forward logic.utils.py
: Utility functions for config parsing and data handling.Configuration | Description |
---|---|
t5_small.yaml | T5-Small: d_kv=64 , num_heads=6 , encoder_num_hidden_layers=8 . |
t5_base.yaml | T5-Base: d_kv=64 , num_heads=12 , encoder_num_hidden_layers=12 . |
t5_3B.yaml | T5-3B: d_kv=128 , num_heads=32 , encoder_num_hidden_layers=24 . |
t5_11B.yaml | T5-11B: d_kv=128 , num_heads=128 , encoder_num_hidden_layers=24 . |
LayerNorm
instead of the originally proposed RMSNorm
due to hardware support constraints.