MA ViT Family
| model | params (m) | pretrain | head | train | GFLOPs | mAP |
|---|---|---|---|---|---|---|
| MA ViT-T | 16.0 | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 12 | 219.0 | 47.6 |
| MA ViT-T | 16.0 | IN-1k : Sup. : 300 | RetinaNet | COCO (train) : 12 | 201.0 | 45.6 |
| MA ViT-S | 27.0 | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 12 | 262.0 | 50.2 |
| MA ViT-S | 27.0 | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 36 | 262.0 | 51.4 |
| MA ViT-S | 27.0 | IN-1k : Sup. : 300 | Cascade Mask R-CNN | COCO (train) : 36 | 741.0 | 54.2 |
| MA ViT-S | 27.0 | IN-1k : Sup. : 300 | RetinaNet | COCO (train) : 12 | 244.0 | 48.3 |
| MA ViT-B | 50.0 | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 12 | 372.0 | 51.7 |
| MA ViT-B | 50.0 | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 36 | 372.0 | 53.2 |
| MA ViT-B | 50.0 | IN-1k : Sup. : 300 | Cascade Mask R-CNN | COCO (train) : 36 | 851.0 | 55.5 |
| MA ViT-B | 50.0 | IN-1k : Sup. : 300 | RetinaNet | COCO (train) : 12 | 353.0 | 49.9 |
| MA ViT-L | 98.0 | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 12 | 501.0 | 52.5 |
| MA ViT-L | 98.0 | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 36 | 501.0 | 53.6 |
| MA ViT-L | 98.0 | IN-1k : Sup. : 300 | Cascade Mask R-CNN | COCO (train) : 36 | 979.0 | 56.0 |
| MA ViT-L | 98.0 | IN-1k : Sup. : 300 | RetinaNet | COCO (train) : 12 | 482.0 | 50.6 |
| model | params (m) | pretrain | finetune | gflops | IN-1k |
|---|---|---|---|---|---|
| MA ViT-T | 16.0 | IN-1k : Sup. : 300 | — : — : — | 2.5 | 82.9/— |
| MA ViT-S | 27.0 | IN-1k : Sup. : 300 | — : — : — | 4.6 | 84.7/— |
| MA ViT-B | 50.0 | IN-1k : Sup. : 300 | — : — : — | 9.9 | 85.7/— |
| MA ViT-L | 98.0 | IN-1k : Sup. : 300 | — : — : — | 16.1 | 86.0/— |
COCO (val)
| model | pretrain | head | train | gflops | mAPb | APb50 | APb75 | mAPbs | mAPbm | mAPbl |
|---|---|---|---|---|---|---|---|---|---|---|
| MA ViT-T | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 12 | 219.0 | 47.6 | 69.5 | 52.5 | — | — | — |
| MA ViT-T | IN-1k : Sup. : 300 | RetinaNet | COCO (train) : 12 | 201.0 | 45.6 | 66.7 | 48.9 | 28.9 | 49.7 | 61.1 |
| MA ViT-S | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 12 | 262.0 | 50.2 | 71.7 | 55.3 | — | — | — |
| MA ViT-S | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 36 | 262.0 | 51.4 | 72.6 | 56.2 | — | — | — |
| MA ViT-S | IN-1k : Sup. : 300 | Cascade Mask R-CNN | COCO (train) : 36 | 741.0 | 54.2 | 72.6 | 58.6 | — | — | — |
| MA ViT-S | IN-1k : Sup. : 300 | RetinaNet | COCO (train) : 12 | 244.0 | 48.3 | 69.4 | 52.2 | 31.8 | 52.6 | 64.0 |
| MA ViT-B | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 12 | 372.0 | 51.7 | 73.3 | 57.0 | — | — | — |
| MA ViT-B | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 36 | 372.0 | 53.2 | 74.1 | 58.5 | — | — | — |
| MA ViT-B | IN-1k : Sup. : 300 | Cascade Mask R-CNN | COCO (train) : 36 | 851.0 | 55.5 | 74.0 | 60.4 | — | — | — |
| MA ViT-B | IN-1k : Sup. : 300 | RetinaNet | COCO (train) : 12 | 353.0 | 49.9 | 71.1 | 53.8 | 33.7 | 54.5 | 65.5 |
| MA ViT-L | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 12 | 501.0 | 52.5 | 73.6 | 57.8 | — | — | — |
| MA ViT-L | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 36 | 501.0 | 53.6 | 74.3 | 58.7 | — | — | — |
| MA ViT-L | IN-1k : Sup. : 300 | Cascade Mask R-CNN | COCO (train) : 36 | 979.0 | 56.0 | 74.6 | 60.9 | — | — | — |
| MA ViT-L | IN-1k : Sup. : 300 | RetinaNet | COCO (train) : 12 | 482.0 | 50.6 | 71.7 | 54.9 | 34.1 | 55.3 | 65.6 |
COCO (val)
| model | pretrain | head | train | gflops | mAPm | APm50 | APm75 | mAPms | mAPmm | mAPml |
|---|---|---|---|---|---|---|---|---|---|---|
| MA ViT-T | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 12 | 219.0 | 42.9 | 66.5 | 46.4 | — | — | — |
| MA ViT-S | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 12 | 262.0 | 44.7 | 68.7 | 47.9 | — | — | — |
| MA ViT-S | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 36 | 262.0 | 45.5 | 69.8 | 49.2 | — | — | — |
| MA ViT-S | IN-1k : Sup. : 300 | Cascade Mask R-CNN | COCO (train) : 36 | 741.0 | 47.0 | 70.5 | 51.1 | — | — | — |
| MA ViT-B | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 12 | 372.0 | 46.1 | 70.6 | 50.1 | — | — | — |
| MA ViT-B | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 36 | 372.0 | 47.0 | 71.5 | 51.1 | — | — | — |
| MA ViT-B | IN-1k : Sup. : 300 | Cascade Mask R-CNN | COCO (train) : 36 | 851.0 | 48.0 | 71.7 | 52.5 | — | — | — |
| MA ViT-L | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 12 | 501.0 | 46.5 | 71.0 | 50.6 | — | — | — |
| MA ViT-L | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 36 | 501.0 | 47.2 | 71.5 | 51.4 | — | — | — |
| MA ViT-L | IN-1k : Sup. : 300 | Cascade Mask R-CNN | COCO (train) : 36 | 979.0 | 48.4 | 72.4 | 52.9 | — | — | — |
ADE20K (val)
| model | pretrain | head | train | gflops | mIoUms | pAccms | mAccms | mIoUss | pAccss | mAccss |
|---|---|---|---|---|---|---|---|---|---|---|
| MA ViT-T | IN-1k : Sup. : 300 | UPerNet | ADE20K (train) : 160 : 512 | 893.0 | — | — | — | 48.4 | — | — |
| MA ViT-T | IN-1k : Sup. : 300 | Panoptic FPN | ADE20K (train) : 80 : 512 | 136.0 | — | — | — | 47.6 | — | — |
| MA ViT-S | IN-1k : Sup. : 300 | UPerNet | ADE20K (train) : 160 : 512 | 937.0 | — | — | — | 51.0 | — | — |
| MA ViT-S | IN-1k : Sup. : 300 | Panoptic FPN | ADE20K (train) : 80 : 512 | 180.0 | — | — | — | 50.7 | — | — |
| MA ViT-B | IN-1k : Sup. : 300 | UPerNet | ADE20K (train) : 160 : 512 | 1050.0 | — | — | — | 52.8 | — | — |
| MA ViT-B | IN-1k : Sup. : 300 | Panoptic FPN | ADE20K (train) : 80 : 512 | 292.0 | — | — | — | 51.5 | — | — |
| MA ViT-L | IN-1k : Sup. : 300 | UPerNet | ADE20K (train) : 160 : 512 | 1182.0 | — | — | — | 53.6 | — | — |
| MA ViT-L | IN-1k : Sup. : 300 | Panoptic FPN | ADE20K (train) : 80 : 512 | 424.0 | — | — | — | 52.8 | — | — |