The 40th Annual Conference of the Japanese Society for Artificial Intelligence, 2026

Presentation Information

[1Yin-A-28]Memory-Efficient Algorithm for DualSoftmax-based Feature Matching

〇Yoshio Kato¹, Shuhei Tarashima¹ (1. NTT DOCOMO BUSINESS)

Keywords:

Image Matching,GPGPU,DualSoftmax

DualSoftmax-based Matching (DSM) has been widely adopted as a reasonable approach to establish correspondence between local features in the field of image matching. However, DSM intrinsically requires storing the full pairwise similarity matrix, leading to O(H^2W^2) memory consumption with respect to the dimension of the feature map HW. This quadratic complexity becomes a severe bottleneck for highly precise semi-dense matching architecture such as LoFTR. By following the memory-efficient implementations for attention such as FlashAttention,
we propose FlashDSM, a tiling-based algorithm that performs DSM inference without allocating the entire similarity matrix. Our FlashDSM improves memory usage from O(H^2W^2) to O(HW) without any approximations, and our experiments show that the computation time gets ×13.5 faster and the memory consumption gets ×50.5 lower in evaluation than the naive PyTorch implementation on the resolution of 1280×720.

Back to Session information