The 40th Annual Conference of the Japanese Society for Artificial Intelligence, 2026

Presentation Information

2:15 PM - 2:30 PM JST(5:15 AM - 5:30 AM UTC)

[2K4-GS-7b-04]Guiding Cross-Attention with Geometric Correspondences for Virtual Try-On

〇Kosuke Takemoto¹, Takafumi Koshinaka¹ (1. Yokohama City University)

Keywords:

Virtual try-on,Diffusion model,SIFT keypoint matching,Cross-attention,Image generation

Recent diffusion-based virtual try-on methods synthesize given garments by incorporating garment features through attention mechanisms. However, attention-based conditional diffusion models for garments often fail to preserve fine details such as text, logos, and illustrations. This work aims to improve garment fidelity by explicitly training cross-attention to learn spatial correspondences. Geometric correspondences extracted from paired training data via keypoint matching are filtered using domain-specific virtual try-on constraints to obtain reliable supervision signals for the cross-attention. A loss term compares these correspondences with attention weights, guiding the cross-attention toward spatially accurate alignments. Experiments on the VITON-HD dataset demonstrate improvements in both quantitative and qualitative evaluations, with enhanced preservation of garment details. Attention map visualization confirms that our method concentrates attention weights on corresponding garment regions, contributing to garment fidelity in generated images.

Back to Session information