Distilling Datasets Into Less Than One Image

The Hebrew University of Jerusalem

*Indicates Equal Contribution
Poster Dataset Distillation (PoDD)

Poster Dataset Distillation (PoDD): We propose PoDD, a new dataset distillation setting for a tiny, under 1 image-per-class (IPC) budget. In this example, the standard method attains an accuracy of 35.5% on CIFAR-100 with approximately 100k pixels, PoDD achieves an accuracy of 35.7% with less than half the pixels (roughly 40k)

Abstract

Dataset distillation aims to compress a dataset into a much smaller one so that a model trained on the distilled dataset achieves high accuracy. Current methods frame this as maximizing the distilled classification accuracy for a budget of K distilled images-per-class, where K is a positive integer. In this paper, we push the boundaries of dataset distillation, compressing the dataset into less than an image-per-class. It is important to realize that the meaningful quantity is not the number of distilled images-per-class but the number of distilled pixels-per-dataset. We therefore, propose Poster Dataset Distillation (PoDD), a new approach that distills the entire original dataset into a single poster. The poster approach motivates new technical solutions for creating training images and learnable labels. Our method can achieve comparable or better performance with less than an image-per-class compared to existing methods that use one image-per-class. Specifically, our method establishes a new state-of-the-art performance on CIFAR-10, CIFAR-100, and CUB200 using as little as 0.3 images-per-class.

Less Than One Image-Per-Class

Can you guess the class?

Move your mouse over the image to reveal the answer...
Global Semantics

Dataset Compression Scale

In this paper, we ask: "Can we distill a dataset into less than one image-per-class?" Existing dataset distillation methods are unable to do this as they synthesize one or more distinct images for each class. To this end, we propose Poster Dataset Distillation (PoDD), which distills an entire dataset into a single larger image, that we call a poster. The benefit of the poster representation is the ability to use patches that overlap between the classes. We find that a correctly distilled poster is sufficient for training a model with high accuracy.

Dataset compression scale Dataset Compression Scale: We show increasingly more compressed methods from left to right. The original dataset contains all of the training data and does not perform any compression. Coreset methods select a subset of the original dataset, without modifying the images. Dataset distillation methods compress an entire dataset by synthesizing K images-per-class (IPC), where K is a positive integer. Our method (PoDD), distills an entire dataset into a single poster that achieves the same performance as 1 IPC while using as little as 0.3 IPC

Poster Dataset Distillation Overview

Results of 1 Image-Per-Class

Results of 1 Image-Per-Class

Global and Local Semantics

We investigate whether PoDD can produce distilled posters that exhibit both local and global semantics. We found that in the case of 1 IPC, both local and global semantics are present, but are hard to detect. To further explore this idea, we tested a CIFAR-10 variant of PoDD where we use 10 IPC and distilled a poster per class. Each poster now represents a single class and overlapping patches are always from the same class. The method preserves the local semantics and shows multiple modalities from each class. Moreover, some of the posters also demonstrate global semantics, e.g., the planes have the sky on the top and the grass on the bottom.

Global Semantics

BibTeX

@article{shul2024distilling,
  title={Distilling Datasets Into Less Than One Image},
  author={Shul, Asaf and Horwitz, Eliahu and Hoshen, Yedid},
  journal={arXiv preprint arXiv:2403.12040},
  year={2024}
}