FocalNet Family
model | params (m) | pretrain | finetune | GFLOPs | Top-1 |
---|---|---|---|---|---|
FocalNet-T-LRF | 28.6 | IN-1k : Sup. : 300 | — : — : — | 4.5 | 82.3 |
FocalNet-T-SRF | 28.4 | IN-1k : Sup. : 300 | — : — : — | 4.5 | 82.1 |
FocalNet-S-SRF | 50.3 | IN-1k : Sup. : 300 | — : — : — | 8.7 | 83.4 |
FocalNet-S-LRF | 50.3 | IN-1k : Sup. : 300 | — : — : — | 8.7 | 83.5 |
FocalNet-B-LRF | 88.7 | IN-1k : Sup. : 300 | — : — : — | 15.4 | 83.9 |
FocalNet-B-SRF | 88.1 | IN-1k : Sup. : 300 | — : — : — | 15.3 | 83.7 |
FocalNet-B-SRF | 88.1 | IN-22k : Sup. : 90 | IN-1k : 30 : 224 | 15.3 | 85.6 |
FocalNet-B-SRF | 88.1 | IN-22k : Sup. : 90 | IN-1k : 30 : 384 | 44.8 | 86.5 |
FocalNet-L-SRF | 197.1 | IN-22k : Sup. : 90 | IN-1k : 30 : 224 | 34.2 | 86.5 |
FocalNet-L-SRF | 197.1 | IN-22k : Sup. : 90 | IN-1k : 30 : 384 | 100.6 | 87.3 |
model | params (m) | pretrain | finetune | gflops | IN-1k |
---|---|---|---|---|---|
FocalNet-T-LRF | 28.6 | IN-1k : Sup. : 300 | — : — : — | 4.5 | 82.3/— |
FocalNet-T-SRF | 28.4 | IN-1k : Sup. : 300 | — : — : — | 4.5 | 82.1/— |
FocalNet-S-SRF | 50.3 | IN-1k : Sup. : 300 | — : — : — | 8.7 | 83.4/— |
FocalNet-S-LRF | 50.3 | IN-1k : Sup. : 300 | — : — : — | 8.7 | 83.5/— |
FocalNet-B-LRF | 88.7 | IN-1k : Sup. : 300 | — : — : — | 15.4 | 83.9/— |
FocalNet-B-SRF | 88.1 | IN-1k : Sup. : 300 | — : — : — | 15.3 | 83.7/— |
FocalNet-B-SRF | 88.1 | IN-22k : Sup. : 90 | IN-1k : 30 : 224 | 15.3 | 85.6/— |
FocalNet-B-SRF | 88.1 | IN-22k : Sup. : 90 | IN-1k : 30 : 384 | 44.8 | 86.5/— |
FocalNet-L-SRF | 197.1 | IN-22k : Sup. : 90 | IN-1k : 30 : 224 | 34.2 | 86.5/— |
FocalNet-L-SRF | 197.1 | IN-22k : Sup. : 90 | IN-1k : 30 : 384 | 100.6 | 87.3/— |
COCO (val)
model | pretrain | head | train | gflops | mAPb | APb50 | APb75 | mAPbs | mAPbm | mAPbl |
---|---|---|---|---|---|---|---|---|---|---|
FocalNet-T-LRF | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 12 | 268.0 | 46.1 | 68.2 | 50.6 | — | — | — |
FocalNet-T-LRF | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 36 | 268.0 | 48.0 | 69.7 | 53.0 | — | — | — |
FocalNet-T-LRF | IN-1k : Sup. : 300 | Cascade Mask R-CNN | COCO (train) : 36 | 751.0 | 51.5 | 70.3 | 56.0 | — | — | — |
FocalNet-T-SRF | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 12 | 268.0 | 45.9 | 68.3 | 50.1 | — | — | — |
FocalNet-T-SRF | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 36 | 268.0 | 47.6 | 69.5 | 52.0 | — | — | — |
FocalNet-T-SRF | IN-1k : Sup. : 300 | Cascade Mask R-CNN | COCO (train) : 36 | 746.0 | 51.5 | 70.1 | 55.8 | — | — | — |
FocalNet-S-SRF | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 12 | 356.0 | 48.0 | 69.9 | 52.7 | — | — | — |
FocalNet-S-SRF | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 36 | 356.0 | 48.9 | 70.1 | 53.7 | — | — | — |
FocalNet-S-LRF | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 12 | 365.0 | 48.3 | 70.5 | 53.1 | — | — | — |
FocalNet-S-LRF | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 36 | 365.0 | 49.3 | 70.7 | 54.2 | — | — | — |
FocalNet-B-LRF | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 12 | 507.0 | 49.0 | 70.9 | 53.9 | — | — | — |
FocalNet-B-LRF | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 36 | 507.0 | 49.8 | 70.9 | 54.6 | — | — | — |
FocalNet-B-SRF | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 12 | 496.0 | 48.8 | 70.7 | 53.5 | — | — | — |
FocalNet-B-SRF | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 36 | 496.0 | 49.6 | 70.6 | 54.1 | — | — | — |
COCO (val)
model | pretrain | head | train | gflops | mAPm | APm50 | APm75 | mAPms | mAPmm | mAPml |
---|---|---|---|---|---|---|---|---|---|---|
FocalNet-T-LRF | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 12 | 268.0 | 41.5 | 65.1 | 44.5 | — | — | — |
FocalNet-T-LRF | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 36 | 268.0 | 42.9 | 66.5 | 46.1 | — | — | — |
FocalNet-T-SRF | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 36 | 268.0 | 41.3 | 65.0 | 44.3 | — | — | — |
FocalNet-T-SRF | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 36 | 268.0 | 42.6 | 66.5 | 45.6 | — | — | — |
FocalNet-S-SRF | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 12 | 56.0 | 42.7 | 67.1 | 45.7 | — | — | — |
FocalNet-S-SRF | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 36 | 56.0 | 43.6 | 67.1 | 47.1 | — | — | — |
FocalNet-S-LRF | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 12 | 365.0 | 43.1 | 67.4 | 46.2 | — | — | — |
FocalNet-S-LRF | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 36 | 365.0 | 43.8 | 67.9 | 47.4 | — | — | — |
FocalNet-B-LRF | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 12 | 507.0 | 43.5 | 67.9 | 46.7 | — | — | — |
FocalNet-B-LRF | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 36 | 507.0 | 44.1 | 68.2 | 47.2 | — | — | — |
FocalNet-B-SRF | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 12 | 496.0 | 43.3 | 67.5 | 46.5 | — | — | — |
FocalNet-B-SRF | IN-1k : Sup. : 300 | Mask R-CNN | COCO (train) : 36 | 496.0 | 44.1 | 68.0 | 47.2 | — | — | — |
ADE20K (val)
model | pretrain | head | train | gflops | mIoUms | pAccms | mAccms | mIoUss | pAccss | mAccss |
---|---|---|---|---|---|---|---|---|---|---|
FocalNet-T-LRF | IN-1k : Sup. : 300 | UPerNet | ADE20K (train) : 128 : 512 | 949.0 | 47.8 | — | — | 46.8 | — | — |
FocalNet-T-SRF | IN-1k : Sup. : 300 | UPerNet | ADE20K (train) : 128 : 512 | 944.0 | 47.2 | — | — | 46.5 | — | — |
FocalNet-S-SRF | IN-1k : Sup. : 300 | UPerNet | ADE20K (train) : 128 : 512 | 1035.0 | 50.1 | — | — | 49.3 | — | — |
FocalNet-S-LRF | IN-1k : Sup. : 300 | UPerNet | ADE20K (train) : 128 : 512 | 1044.0 | 50.1 | — | — | 49.1 | — | — |
FocalNet-B-LRF | IN-1k : Sup. : 300 | UPerNet | ADE20K (train) : 128 : 512 | 1192.0 | 51.4 | — | — | 50.5 | — | — |
FocalNet-B-SRF | IN-1k : Sup. : 300 | UPerNet | ADE20K (train) : 128 : 512 | 1180.0 | 51.1 | — | — | 50.2 | — | — |