The 40th Annual Conference of the Japanese Society for Artificial Intelligence, 2026

Presentation Information

[2Yin-B-37]Zero-Data Self-Evolving Agentic Learning for Multimodal Generation

〇Haruki Yonekura^1,2, Fumiaki Sato¹, Taiki Sekii¹ (1. CyberAgent, Inc., 2. The University of Osaka)

Keywords:

Agentic learning,Vision-Language Foundation Model

Generative models are increasingly used in content creation, yet they can still violate user instructions such as attributes, counts, relations, styles, and prohibitions. We study a practical setting where only the final instruction is available and no task-specific preparation (e.g., reference inputs, extra data, or dedicated evaluators) can be assumed. We propose an autonomous agentic learning framework that performs test-time self-adaptation in a task- and domain-agnostic manner. Our system separates four roles—Manager, Generator, Tool, and Verifier—and autonomously constructs self-adaptation samples and a natural-language reward from the final instruction alone. It iteratively generates and verifies outputs, summarizes the obtained history, and transfers the resulting knowledge to the final inference. Experiments on vector image generation and 2D text-to-image generation show improved instruction-following accuracy over baselines.

Back to Session information