We Should Chart an Atlas of All the World's Models

The Hebrew University of Jerusalem

position_overview


Position overview:With millions of public models, it becomes important to move beyond individual models and study entire populations (left). The Model Atlas formalizes this shift by representing models as nodes in a graph, with directed edges denoting weight transformations (e.g., fine-tuning). Node size and color, as well as edge color, encode node and edge-level features; light blue indicates missing or unknown information. The atlas enables a range of applications, including model forensics, meta-ML research, and model discovery (center). In practice, most edges and features are unknown. This motivates ML methods that take models as input and infer their properties, thereby completing the missing atlas regions (right).

Abstract

Public model repositories now contain millions of models, yet most models remain undocumented and effectively lost. In this position paper, we advocate for charting the world's model population in a unified structure we call the Model Atlas: a graph that captures models, their attributes, and the weight transformations that connect them. The Model Atlas enables applications in model forensics, meta-ML research, and model discovery, challenging tasks given today's unstructured model repositories. However, because most models lack documentation, large atlas regions remain uncharted. Addressing this gap motivates new machine learning methods that treat models themselves as data, inferring properties such as functionality, performance, and lineage directly from their weights. We argue that a scalable path forward is to bypass the unique parameter symmetries that plague model weights. Charting all the world's models will require a community effort, and we hope its broad utility will rally researchers toward this goal.

Model Atlas Example The model atlas - Stable Diffusion vs. Llama: The model atlas visualizes models as nodes in a graph, with directed edges indicating transformations (e.g., fine-tuning). This figure shows the top 30% most downloaded models in the Stable Diffusion and Llama regions. Node size reflects cumulative monthly downloads, and color denotes the transformation type relative to the parent model. Please zoom in to see the detailed model trajectories. We observe that the Llama region has more complex structure and a wider diversity of transformation techniques (e.g., quantization, merging) compared to Stable Diffusion. Note that node position is optimized for clarity and does directly reflect distance between model weights.

The Hugging Face atlas

The Hugging Face atlas

While this is a small subset (63,000 models) of the documented regions of HF, it already reveals significant trends.

Depth and structure. The LLM connected component (CC) is deep and complex. It includes almost a third of all models. In contrast, while Flux is also substantial, its structure is much simpler and more uniform.

Quantization. Zoom-in (A) highlights quantization practices across vision, language, and vision-language (V&L) models. Vision models barely use quantization, despite Flux containing more parameters (12B) than Llama (8B). Conversely, quantization is commonplace in LLMs, constituting a large proportion of models. VLMs demonstrate a balance between these extremes.

Adapter and fine-tuning strategies. A notable distinction exists between discriminative (top) and generative (bottom) vision models. Discriminative models primarily employ fine-tuning, while generative models have widely adopted adapters like LoRA. The evolution of adapter adoption over time is evident: Stable-Diffusion 1.4 (SD) (1) mostly used full fine-tuning, while SD 1.5 (2), SD 2 (3), SD XL (4), and Flux (5) progressively use more adapters. Interestingly, the atlas reveals that audio models rarely use adapters, suggesting gaps in cross-community knowledge transfer.

This inter-community variation is particularly evident in model merging. LLMs have embraced model merging, with merged models frequently exceeding the popularity of their parents. This raises interesting questions about the limited role of merging in vision models. For enhanced visualization, we display the top 30% most downloaded models.

Model atlas demo

For full version, visit our Hugging Face space.

Model attribute prediction using the atlas

Currently, most models have very partial documentation. As local atlas regions contain related models, the atlas can also be useful for predicting missing model attributes, including task, accuracy, license, missing weights, and popularity.

Missing attributes

Using atlas structure improves prediction of model accuracy and other attributes, compared to naively using the majority label. In (b), we report the prediction accuracy.

Missing attributes
Missing attributes

Charting the atlas

While we've seen the importance of the model atlas, in practice, over 60% of it is unknown. Using the known regions of the atlas, we identify high-confidence structural priors based on dominant real-world model training practices.

Our approach computes the distance between model weights. Using these priors, our method outperforms the baselines by a significant margin, even for in-the-wild models.

Charting results

BibTeX

@article{horwitz2025charting,
  title={Charting and Navigating Hugging Face's Model Atlas},
  author={Horwitz, Eliahu and Kurer, Nitzan and Kahana, Jonathan and Amar, Liel and Hoshen, Yedid},
  journal={arXiv preprint arXiv:2503.10633},
  year={2025}
}