Presentation Information
[3Yin-A-06]Evaluating Accuracy and Efficiency in Lightweight Image Classification via Knowledge Distillation from Large Foundation Models
〇Yuki Wada1, Yoshitaka Koike1, Masaaki Hayashida1, Yasushi Iwata1 (1. NS Solutions Corporation)
Keywords:
Multimodal LLM,Zero- / few-shot,Knowledge distillation
Large foundation models accumulate extensive knowledge through pre-training; however, their large size requires substantial computational resources and memory, resulting in high operational costs. In contrast, for domain-specific tasks, knowledge distillation can improve the performance of lightweight models.In this study, focusing on image classification, we examine whether distillation from a large foundation model to a lightweight model can improve inference efficiency in practice while maintaining classification performance. Specifically, we use multimodal LLMs with 2B/4B/8B/32B parameters as teacher models and a CLIP-based student model with approximately 150M parameters, which learns image–text alignment via contrastive learning. Distillation is performed on unlabeled images using soft targets generated by the teachers, and its effectiveness is evaluated.Results show that, on datasets where the teacher models clearly outperform the student model, distillation improves the student model's accuracy. Overall, these findings suggest that, for domain-specific tasks, distillation can enable lightweight yet accurate models.
