The 40th Annual Conference of the Japanese Society for Artificial Intelligence, 2026

Presentation Information

[3Yin-A-06]Evaluating Accuracy and Efficiency in Lightweight Image Classification via Knowledge Distillation from Large Foundation Models

〇Yuki Wada¹, Yoshitaka Koike¹, Masaaki Hayashida¹, Yasushi Iwata¹ (1. NS Solutions Corporation)

Keywords:

Multimodal LLM,Zero- / few-shot,Knowledge distillation

Large foundation models accumulate extensive knowledge through pre-training; however, their large size requires substantial computational resources and memory, resulting in high operational costs. In contrast, for domain-specific tasks, knowledge distillation can improve the performance of lightweight models.In this study, focusing on image classification, we examine whether distillation from a large foundation model to a lightweight model can improve inference efficiency in practice while maintaining classification performance. Specifically, we use multimodal LLMs with 2B/4B/8B/32B parameters as teacher models and a CLIP-based student model with approximately 150M parameters, which learns image–text alignment via contrastive learning. Distillation is performed on unlabeled images using soft targets generated by the teachers, and its effectiveness is evaluated.Results show that, on datasets where the teacher models clearly outperform the student model, distillation improves the student model's accuracy. Overall, these findings suggest that, for domain-specific tasks, distillation can enable lightweight yet accurate models.

Comment

To browse or post comments, you must log in.Log in

Back to Session information