Implementation of Vision Transformers (ViT) for image classification on ImageNet-1K.
vit
directory within ModelZoo. Here’s how it’s organized:
configs/
: Contains YAML configuration files for different ViT variants.model.py
: Entry point that initializes and builds the model components used for training and evaluation.ViTModel.py
: Core implementation of the ViT architecture, including patch embedding, transformer encoder blocks, and classification head.ViTClassificationModel.py
: Wraps ViTModel
for classification tasks, managing preprocessing, logits generation, and loss computation.Configuration | Description |
---|---|
params_vit_base_patch_16_imagenet_1k.yaml | ViT-Base model with 16×16 patch size trained on ImageNet-1K. |
params_vit_huge_patch_16_imagenet_1k.yaml | ViT-Huge model with 16×16 patch size trained on ImageNet-1K. |