Presentation Information

[D-20-06]An Empirical Study of Pretraining to Enhance Embedding Alignment in Vision-and-Language Models

○Satoshi Kawamura1, Kohei Yamamoto1, Mariko Tomariguchi1, Hideaki Tamai1 (1. OKI)

Keywords:

Vision-and-Language Models,multimodal,alignment