Abstract
Diffusion models have become the go-to method for many generative tasks, particularly for image-to-image generation tasks such as super-resolution and inpainting. Current diffusion-based methods do not provide statistical guarantees regarding the generated results, often preventing their use in high-stakes situations. To bridge this gap, we construct a confidence interval around each generated pixel such that the true value of the pixel is guaranteed to fall within the interval with a probability set by the user. Since diffusion models parametrize the data distribution, a straightforward way of constructing such intervals is by drawing multiple samples and calculating their bounds. However, this method has several drawbacks: i) slow sampling speeds ii) suboptimal bounds iii) requires training a diffusion model per task. To mitigate these shortcomings we propose Conffusion, wherein we fine-tune a pre-trained diffusion model to predict interval bounds in a single forward pass. We show that Conffusion outperforms the baseline method while being three orders of magnitude faster.

Bounds extracted via N-Conffusion for super-resolution. When needed, the bounds span a wide range, e.g., bounding eye color from below with darker colors and from above with brighter ones. As expected, areas with higher frequencies contain more errors and have wider intervals (e.g. hair).

Bounds extracted via N-Conffusion for inpainting (context is dimmed for visualization). In areas with a wider distribution, our bounds cover the distribution. Bounding from below with sunglasses and darker eyes and from above with eyeglasses and brighter eyes. The interval size reveals areas where the model is less confident in its prediction (e.g. glasses and eyes).

We compare the different methods on the inpainting task. ADMUQ produces blurry bounds, also apparent from the smoother interval heatmap. Although DMSB generates the sharpest intervals, the estimated bounds may contain artifacts resulting from the lack of interpolation capabilities (e.g. "duplicated" nose and lips in the lower bound). N-Conffusion combines the best of both worlds, generating sharp intervals while maintaining realistic bounds
Visualizing Conffusion
BibTeX
@article{horwitz2022conffusion,
title={Conffusion: Confidence Intervals for Diffusion Models},
author={Horwitz, Eliahu and Hoshen, Yedid},
journal={arXiv preprint arXiv:2211.09795},
year={2022}
}