Model Atlas Example


The model atlas - Stable Diffusion vs. Llama: The model atlas visualizes models as nodes in a graph, with directed edges indicating transformations (e.g., fine-tuning). This figure shows the top 30% most downloaded models in the Stable Diffusion and Llama regions. Node size reflects cumulative monthly downloads, and color denotes the transformation type relative to the parent model. Please zoom in to see the detailed model trajectories. We observe that the Llama region has more complex structure and a wider diversity of transformation techniques (e.g., quantization, merging) compared to Stable Diffusion. Note that node position is optimized for clarity and does directly reflect distance between model weights.

Abstract

As there are now millions of publicly available neural networks, searching and analyzing large model repositories becomes increasingly important. Navigating so many models requires an atlas, but as most models are poorly documented charting such an atlas is challenging. To explore the hidden potential of model repositories, we chart a preliminary atlas representing the documented fraction of Hugging Face. It provides stunning visualizations of the model landscape and evolution. We demonstrate several applications of this atlas including predicting model attributes (e.g., accuracy), and analyzing trends in computer vision models. However, as the current atlas remains incomplete, we propose a method for charting undocumented regions. Specifically, we identify high-confidence structural priors based on dominant real-world model training practices. Leveraging these priors, our approach enables accurate mapping of previously undocumented areas of the atlas. We publicly release our datasets, code, and interactive atlas.

The Hugging Face atlas

The Hugging Face atlas

While this is a small subset (63,000 models) of the documented regions of HF, it already reveals significant trends.

Depth and structure. The LLM connected component (CC) is deep and complex. It includes almost a third of all models. In contrast, while Flux is also substantial, its structure is much simpler and more uniform.

Quantization. Zoom-in (A) highlights quantization practices across vision, language, and vision-language (V&L) models. Vision models barely use quantization, despite Flux containing more parameters (12B) than Llama (8B). Conversely, quantization is commonplace in LLMs, constituting a large proportion of models. VLMs demonstrate a balance between these extremes.

Adapter and fine-tuning strategies. A notable distinction exists between discriminative (top) and generative (bottom) vision models. Discriminative models primarily employ fine-tuning, while generative models have widely adopted adapters like LoRA. The evolution of adapter adoption over time is evident: Stable-Diffusion 1.4 (SD) (1) mostly used full fine-tuning, while SD 1.5 (2), SD 2 (3), SD XL (4), and Flux (5) progressively use more adapters. Interestingly, the atlas reveals that audio models rarely use adapters, suggesting gaps in cross-community knowledge transfer.

This inter-community variation is particularly evident in model merging. LLMs have embraced model merging, with merged models frequently exceeding the popularity of their parents. This raises interesting questions about the limited role of merging in vision models. For enhanced visualization, we display the top 30% most downloaded models.

Model atlas demo

For full version, visit our Hugging Face space.

Model attribute prediction using the atlas

Currently, most models have very partial documentation. As local atlas regions contain related models, the atlas can also be useful for predicting missing model attributes, including task, accuracy, license, missing weights, and popularity.

Missing attributes

Using atlas structure improves prediction of model accuracy and other attributes, compared to naively using the majority label. In (b), we report the prediction accuracy.

Missing attributes
Missing attributes

Charting the atlas

While we've seen the importance of the model atlas, in practice, over 60% of it is unknown. Using the known regions of the atlas, we identify high-confidence structural priors based on dominant real-world model training practices.

Our approach computes the distance between model weights. Using these priors, our method outperforms the baselines by a significant margin, even for in-the-wild models.

Charting results

BibTeX

Coming soon