Heedless Backbones

About

Heedless Backbones is a web application designed to help researchers and industry practitioners compare the performance of different computer vision backbone models across classification and downstream tasks. The project aims to provide more comprehensive and useful data than existing solutions by focusing specifically on computer vision backbones and treating pretrained foundation models as first-class citizens.

Latest Updates

(October 16, 2024) Added UniRepLKNet

(October 13, 2024) Added VMamba

Why Heedless Backbones

Paperswithcode's basic data models and user interface aren't useful either for researchers or industry users interested in comparing the performance of different computer vision backbones for different tasks. The (visible) data model doesn't include:

  • Model Family and Model What head was used for the downstream task (e.g. object detection) or what backbone was used
  • What pretraining dataset was used (e.g. IN-1K, IN-21k)
  • Details of the pretraining, finetuning, or downstream training
  • Throughput, and sometimes even GFLOPS and the number of parameters

This means, for example, that you can't easily:

  • Compare the performance of different model families (e.g. compare the Swin and ConvNeXt families)
  • Compare model accuracy on multiple tasks
  • Do apples-to-apples accuracy comparison, even on one dataset and one task

In addition, the user interface doesn't allow for interesting queries (e.g. what's the best model on ImageNet that can do better than 1000 fps on a V100 with AMP?), and the database is inconsistently maintained. Heedless Backbones is an attempt to address these shortcomings of Paperswithcode within the space of computer vision backbones. It is built on a data model that treats pretrained foundation models as first class citizens and because of this allows you to make fairly complicated, interesting visualizations of model performance on different tasks.

Interactive Visualization

Using this data, you can compare model performance across multiple dimensions:

  • Compare any two metrics (e.g., accuracy vs. GFLOPS, or performance across different tasks)
  • Filter results by task type, dataset, model head, and resolution
  • See how different pretraining methods and datasets affect downstream performance
  • Customize plot legends to highlight specific model characteristics
How It Works

Each computer vision model (e.g. ConvNeXt-Tiny) has the following data:

Family Information: Each model belongs to a family (e.g. the ConvNeXt family) with shared characteristics such as:

  • Architecture features
  • Publication date
  • Pretraining method

Pretrained Models: Each model has multiple pretrained versions, each with different:

  • Pretraining datasets (e.g., ImageNet-1K, ImageNet-21K)
  • Pretraining methods (e.g. Supervised, MAE, etc.)
  • Training configurations (e.g. training resolution, number of epochs)
  • Number of parameters

Performance Results: Each pretrained model has results for different tasks:

  • Classification accuracy
    • Top-1 and Top-5
    • GFLOPS
    • Eval resolution
    • Finetuning information
  • Detection and Instance Segmentation
    • AP metrics
    • GFLOPS
    • Finetuning information
    • What detection or instance segmentation head was used
  • Semantic Segmentation
    • mIoU and Pixel Accuracy
    • GFLOPS
    • Eval resolution and single- vs multi-scale
    • Finetuning information
    • What detection or instance segmentation head was used

Throughput Measurements: Each model has throughput measurements for different tasks, when available:

  • Various GPU types (V100, A100, etc.)
  • Different precision modes (FP16, FP32, AMP)
  • Various resolutions
Data Quality

Unlike crowd-sourced platforms, all the data in Heedless Backbones is managed by me ensure consistency and accuracy. The database is populated using a combination of manual review and LLM-assisted data entry, which helps maintain high data quality while enabling regular updates as new models are published.

Because of this, each model entry includes detailed metadata about training configurations, ensuring that comparisons are as fair as possible. This includes:

  • Training protocols (epochs, resolution, datasets used)
  • Evaluation settings (e.g. single-crop vs multi-crop for semantic segmentation)
  • Links to source papers and code repositories
  • Performance measurements under specified conditions
Caveat: Throughput Metrics

While Heedless Backbones allows you to compare model throughput (FPS), consider the following limitations:

  • Deep learning libraries are frequently updated, and different versions can significantly impact throughput even if the configuration (GPU, precision, batch size) is otherwise the same
  • Results vary substantially across different GPUs (V100, A100, etc.) and precision modes (FP16, FP32, AMP), so you won't be able to compare two models unless the paper authors recorded results with the same configuration
  • Batch sizes and other implementation details (which I do not record) can greatly affect measured performance
Contributing

While direct contributions to the database are not currently accepted to maintain data consistency, you can:

  • Report issues or suggest improvements on GitHub
  • Request specific models or features to be added
  • Use the open-source code to deploy your own instance