The 40th Annual Conference of the Japanese Society for Artificial Intelligence, 2026

Presentation Information

4:00 PM - 4:15 PM JST(7:00 AM - 7:15 AM UTC)

[4K5-GS-6c-03]Towards Online and Token-Level Direct Preference Optimization in Machine Translation

〇Yin Zhang¹, Takehito Utsuro¹, Masaaki Nagata² (1. University of Tsukuba, 2. NTT Communication Science Laboratories)

Keywords:

Machine Translation,Reinforcement Learning

Direct Preference Optimization (DPO) has recently shown strong performance in aligning large language models with human preferences,but existing approaches are mostly applied offline and at the sequence level.This limits their ability to adapt to dynamic feedback and to capture fine-grained translation errors such as omissions,mistranslations, and local fluency issues.In this work, we propose an online token-level DPO framework for machine translation. Our method extends standard DPO in two directions:(1) online optimization, where preference data are generated and incorporated during training, enabling continuous model improvement; and(2) token-level preference modeling, which assigns preferences at a finer granularity instead of treating each translation as a single unit.By integrating token-level preference signals into an online DPO pipeline, the model can better learn which local translation choices contribute to overall translation quality. We apply our approach to machine translation tasks and show that it improves adequacy compared to conventional sequence-level and offline DPO methods. Our results suggest that fine-grained, online preference optimization is a promising direction forbuilding more reliable and adaptive machine translation systems.

Comment

To browse or post comments, you must log in.Log in

Back to Session information