Presentation Information
[5L3-OS-6b-05]Adaptive Feature Generation with Multimodal Large Language Models for Predictive Modeling
〇Kosuke Yoshimura1, Hisashi Kashima1 (1. Kyoto University)
Keywords:
Feature Generation,Multimodal Large Language Models
In predictive modeling with limited training data and computational resources, generating high-accuracy and interpretable features remains a critical challenge. Particularly in applications requiring high reliability, explainable features are indispensable. Traditionally, human-in-the-loop feature engineering has been employed; however, low throughput due to manual labor often becomes a bottleneck in practical operations.
In this study, we propose a method that adaptively performs feature definition and labeling using Multimodal Large Language Models (MLLMs). By replacing the human roles in the existing AdaFlock framework with MLLMs, our approach achieves significantly faster feature generation compared to manual processes. Specifically, the method dynamically generates a set of interpretable features through MLLM prompting and constructs an ensemble classifier based on these features.
Experimental results across three different modalities demonstrate that our proposed method consistently outperforms direct inference by MLLMs, particularly when using Qwen3. Furthermore, the training processing is completed within a maximum of approximately 37 minutes. This confirms that our method is highly practical for real-world deployment compared to conventional approaches reliant on human resources such as crowdsourcing.
In this study, we propose a method that adaptively performs feature definition and labeling using Multimodal Large Language Models (MLLMs). By replacing the human roles in the existing AdaFlock framework with MLLMs, our approach achieves significantly faster feature generation compared to manual processes. Specifically, the method dynamically generates a set of interpretable features through MLLM prompting and constructs an ensemble classifier based on these features.
Experimental results across three different modalities demonstrate that our proposed method consistently outperforms direct inference by MLLMs, particularly when using Qwen3. Furthermore, the training processing is completed within a maximum of approximately 37 minutes. This confirms that our method is highly practical for real-world deployment compared to conventional approaches reliant on human resources such as crowdsourcing.
