About

Heedless Backbones is a web application designed to help researchers and industry practitioners compare the performance of different computer vision backbone models across classification and downstream tasks. The project aims to provide more comprehensive and useful data than existing solutions by focusing specifically on computer vision backbones and treating pretrained foundation models as first-class citizens.

Latest Updates

April 20, 2025

Added BiFormer
Added MambaOut
Added GroupMamba
Added Hier-Vim
Added Vim
Added PlainMamba
Added LocalVim
Added LocalVMamba
Added EfficientVMamba
Added DAMamba
Added VSSD
Added RMT
Added DAT++

April 15, 2025

Added SLaK
Added RepLKNet

April 14, 2025

Added FAN (Fully Attentional Networks)

October 16, 2024

Added UniRepLKNet

October 13, 2024

Added VMamba

September 29, 2024

Added InternImage
Added FocalNet
Added CSwin
Added RandFormer
Added IdentityFormer
Added InternImage
Added FocalNet
Added CSwin
Added RandFormer
Added IdentityFormer
Added ConvFormer
Added CAFormer
Added MaxViT
Added MogaNet
Added CoAtNet

(September 24, 2024)

Added Hiera

Why Heedless Backbones

Paperswithcode's basic data models and user interface aren't useful either for researchers or industry users interested in comparing the performance of different computer vision backbones for different tasks. The (visible) data model doesn't include:

Model Family and Model What head was used for the downstream task (e.g. object detection) or what backbone was used
What pretraining dataset was used (e.g. IN-1K, IN-21k)
Details of the pretraining, finetuning, or downstream training
Throughput, and sometimes even GFLOPS and the number of parameters

This means, for example, that you can't easily:

Compare the performance of different model families (e.g. compare the Swin and ConvNeXt families)
Compare model accuracy on multiple tasks
Do apples-to-apples accuracy comparison, even on one dataset and one task

In addition, the user interface doesn't allow for interesting queries (e.g. what's the best model on ImageNet that can do better than 1000 fps on a V100 with AMP?), and the database is inconsistently maintained. Heedless Backbones is an attempt to address these shortcomings of Paperswithcode within the space of computer vision backbones. It is built on a data model that treats pretrained foundation models as first class citizens and because of this allows you to make fairly complicated, interesting visualizations of model performance on different tasks.

Interactive Visualization

Using this data, you can compare model performance across multiple dimensions:

Compare any two metrics (e.g., accuracy vs. GFLOPS, or performance across different tasks)
Filter results by task type, dataset, model head, and resolution
See how different pretraining methods and datasets affect downstream performance
Customize plot legends to highlight specific model characteristics

How It Works

Each computer vision model (e.g. ConvNeXt-Tiny) has the following data:

Family Information: Each model belongs to a family (e.g. the ConvNeXt family) with shared characteristics such as:

Architecture features
Publication date
Pretraining method

Pretrained Models: Each model has multiple pretrained versions, each with different:

Pretraining datasets (e.g., ImageNet-1K, ImageNet-21K)
Pretraining methods (e.g. Supervised, MAE, etc.)
Training configurations (e.g. training resolution, number of epochs)
Number of parameters

Performance Results: Each pretrained model has results for different tasks:

Classification accuracy
- Top-1 and Top-5
- GFLOPS
- Eval resolution
- Finetuning information
Detection and Instance Segmentation
- AP metrics
- GFLOPS
- Finetuning information
- What detection or instance segmentation head was used
Semantic Segmentation
- mIoU and Pixel Accuracy
- GFLOPS
- Eval resolution and single- vs multi-scale
- Finetuning information
- What detection or instance segmentation head was used

Throughput Measurements: Each model has throughput measurements for different tasks, when available:

Various GPU types (V100, A100, etc.)
Different precision modes (FP16, FP32, AMP)
Various resolutions

Data Quality

Unlike crowd-sourced platforms, all the data in Heedless Backbones is managed by me ensure consistency and accuracy. The database is populated using a combination of manual review and LLM-assisted data entry, which helps maintain high data quality while enabling regular updates as new models are published.

Because of this, each model entry includes detailed metadata about training configurations, ensuring that comparisons are as fair as possible. This includes:

Training protocols (epochs, resolution, datasets used)
Evaluation settings (e.g. single-crop vs multi-crop for semantic segmentation)
Links to source papers and code repositories
Performance measurements under specified conditions

Caveat: Throughput Metrics

While Heedless Backbones allows you to compare model throughput (FPS), consider the following limitations:

Deep learning libraries are frequently updated, and different versions can significantly impact throughput even if the configuration (GPU, precision, batch size) is otherwise the same
Results vary substantially across different GPUs (V100, A100, etc.) and precision modes (FP16, FP32, AMP), so you won't be able to compare two models unless the paper authors recorded results with the same configuration
Batch sizes and other implementation details (which I do not record) can greatly affect measured performance

Contributing

While direct contributions to the database are not currently accepted to maintain data consistency, you can:

Report issues or suggest improvements on GitHub
Request specific models or features to be added
Use the open-source code to deploy your own instance