Presentation Information
[2Yin-B-37]Zero-Data Self-Evolving Agentic Learning for Multimodal Generation
〇Haruki Yonekura1,2, Fumiaki Sato1, Taiki Sekii1 (1. CyberAgent, Inc., 2. The University of Osaka)
Keywords:
Agentic learning,Vision-Language Foundation Model
Generative models are increasingly used in content creation, yet they can still violate user instructions such as attributes, counts, relations, styles, and prohibitions. We study a practical setting where only the final instruction is available and no task-specific preparation (e.g., reference inputs, extra data, or dedicated evaluators) can be assumed. We propose an autonomous agentic learning framework that performs test-time self-adaptation in a task- and domain-agnostic manner. Our system separates four roles—Manager, Generator, Tool, and Verifier—and autonomously constructs self-adaptation samples and a natural-language reward from the final instruction alone. It iteratively generates and verifies outputs, summarizes the obtained history, and transfers the resulting knowledge to the final inference. Experiments on vector image generation and 2D text-to-image generation show improved instruction-following accuracy over baselines.
