Charting and Navigating Hugging Face's Model Atlas

Horwitz, Eliahu; Kurer, Nitzan; Kahana, Jonathan; Amar, Liel; Hoshen, Yedid

We Should Chart an Atlas of All the World's Models

Eliahu Horwitz, Nitzan Kurer, Jonathan Kahana, Liel Amar, Yedid Hoshen

The Hebrew University of Jerusalem
NeurIPS 2025 - Position Paper

Paper 🤗 Model Atlas Demo Code - Coming Soon... arXiv 🤗 Dataset

position_overview

Position overview: With millions of public models, it becomes important to move beyond individual models and study entire populations (left). The Model Atlas formalizes this shift by representing models as nodes in a graph, with directed edges denoting weight transformations (e.g., fine-tuning). Node size and color, as well as edge color, encode node and edge-level features; light blue indicates missing or unknown information. The atlas enables a range of applications, including model forensics, meta-ML research, and model discovery (center). In practice, most edges and features are unknown. This motivates ML methods that take models as input and infer their properties, thereby completing the missing atlas regions (right).

Abstract

Public model repositories now contain millions of models, yet most models remain undocumented and effectively lost. In this position paper, we advocate for charting the world's model population in a unified structure we call the Model Atlas: a graph that captures models, their attributes, and the weight transformations that connect them. The Model Atlas enables applications in model forensics, meta-ML research, and model discovery, challenging tasks given today's unstructured model repositories. However, because most models lack documentation, large atlas regions remain uncharted. Addressing this gap motivates new machine learning methods that treat models themselves as data, inferring properties such as functionality, performance, and lineage directly from their weights. We argue that a scalable path forward is to bypass the unique parameter symmetries that plague model weights. Charting all the world's models will require a community effort, and we hope its broad utility will rally researchers toward this goal.

The model atlas - Stable Diffusion vs. Llama: The model atlas visualizes models as nodes in a graph, with directed edges indicating transformations (e.g., fine-tuning). This figure shows the top 30% most downloaded models in the Stable Diffusion and Llama regions. Node size reflects cumulative monthly downloads, and color denotes the transformation type relative to the parent model. Please zoom in to see the detailed model trajectories. We observe that the Llama region has more complex structure and a wider diversity of transformation techniques (e.g., quantization, merging) compared to Stable Diffusion. Note that node position is optimized for clarity and does directly reflect distance between model weights.

The Hugging Face atlas

While this is a small subset (63,000 models) of the documented regions of HF, it already reveals significant trends.

Depth and structure. The LLM connected component (CC) is deep and complex. It includes almost a third of all models. In contrast, while Flux is also substantial, its structure is much simpler and more uniform.

Quantization. Zoom-in (A) highlights quantization practices across vision, language, and vision-language (V&L) models. Vision models barely use quantization, despite Flux containing more parameters (12B) than Llama (8B). Conversely, quantization is commonplace in LLMs, constituting a large proportion of models. VLMs demonstrate a balance between these extremes.

Adapter and fine-tuning strategies. A notable distinction exists between discriminative (top) and generative (bottom) vision models. Discriminative models primarily employ fine-tuning, while generative models have widely adopted adapters like LoRA. The evolution of adapter adoption over time is evident: Stable-Diffusion 1.4 (SD) (1) mostly used full fine-tuning, while SD 1.5 (2), SD 2 (3), SD XL (4), and Flux (5) progressively use more adapters. Interestingly, the atlas reveals that audio models rarely use adapters, suggesting gaps in cross-community knowledge transfer.

This inter-community variation is particularly evident in model merging. LLMs have embraced model merging, with merged models frequently exceeding the popularity of their parents. This raises interesting questions about the limited role of merging in vision models. For enhanced visualization, we display the top 30% most downloaded models.

Model atlas demo

For full version, visit our Hugging Face space.

Model attribute prediction using the atlas

Currently, most models have very partial documentation. As local atlas regions contain related models, the atlas can also be useful for predicting missing model attributes, including task, accuracy, license, missing weights, and popularity.

Using atlas structure improves prediction of model accuracy and other attributes, compared to naively using the majority label. In (b), we report the prediction accuracy.

Charting the atlas

While we've seen the importance of the model atlas, in practice, over 60% of it is unknown. Using the known regions of the atlas, we identify high-confidence structural priors based on dominant real-world model training practices.

Quantizations are leaves: Our analysis of over 400,000 documented model relationships reveals that 99.41% of quantized models are leaf nodes. This figure shows this for a subset of the Llama-based models. Indeed, quantized models (magenta) are nearly always leaf nodes, corroborating the statistical finding.

Temporal dynamics indicate edge directionality: We analyzed over 400,000 documented model relationships and observed that in 99.73% of cases, earlier upload times correlate with topologically higher positions in the DAG. Here, we visualize this trend on a subset of the Llama model family. Green nodes indicate models where earlier upload times align with topological order, while red nodes represent exceptions to this trend. The source (in gray) vacuously satisfied this assumption. It is clear that nearly all nodes satisfy our assumption.

Snake vs. Fan patterns: Snake patterns often arise from sequential training checkpoints, while fan patterns typically result from hyperparameter sweeps. In both structures the model weight variance is low. However, in snake patterns the weight distance has high correlation with model upload time, whereas in fan patterns the correlation is lower.

Our approach computes the distance between model weights. Using these priors, our method outperforms the baselines by a significant margin, even for in-the-wild models.

BibTeX

@article{horwitz2025charting,
  title={Charting and Navigating Hugging Face's Model Atlas},
  author={Horwitz, Eliahu and Kurer, Nitzan and Kahana, Jonathan and Amar, Liel and Hoshen, Yedid},
  journal={arXiv preprint arXiv:2503.10633},
  year={2025}
}

More Works from Our Lab

Recovering the Pre-Fine-Tuning Weights of Generative Models

Learning on Model Weights using Tree Experts

Unsupervised Model Tree Heritage Recovery

Deep Linear Probe Generators for Weight Space Learning

Can this Model Also Recognize Dogs? Zero-Shot Model Search from Weights