Presentation Information

[2K4-GS-7b-03]Zero-Shot Classification Utilizing Visible Regions via Diffusion Inpainting Models

〇Ryota Suzuki1, Tatsuhito Hasegawa1 (1. University of Fukui)

Keywords:

Diffusion Model,Zero-Shot Classification,Diffusion Classifier

Zero-shot image classification without additional training has attracted significant attention due to advancements in Vision-Language Models. While the Diffusion Classifier surpasses CLIP in robustness, it requires numerous iterative calculations for accurate likelihood estimation, making high computational cost a critical issue. To address this, we propose a novel classification method leveraging the context-completion capability of inpainting models. Uniquely, our approach evaluates consistency with conditioning from visible regions in addition to conventional text conditioning. Comparative experiments using Stable Diffusion v1.5 and methods like SD-Inpaint showed that BrushNet, utilizing feature injection into intermediate layers, avoided the accuracy degradation observed in other methods and maintained baseline performance. Furthermore, our method achieved higher accuracy in early inference stages, contributing to improved efficiency. This study suggests that utilizing visible regions via inpainting models reduces uncertainty in noise prediction, offering an effective approach to enhance the practicality of diffusion-based classification.