Presentation Information
[4Yin-B-61]Improving Output Diversity in Large Language Models with Model Merging
〇Yasuaki Sumita1, Tomoharu Iwata2, Toshiyuki Tanaka1 (1. Kyoto University, 2. NTT, Inc.)
Keywords:
Large Language Model,Model Merge,Diversity
While large language models demonstrate strong performance across various tasks, they face challenges in generating text with sufficient diversity. Existing research has evaluated text diversity mainly from two perspectives: form diversity and semantic diversity. Some studies show that models aligned with human preference exhibit higher form diversity but lower semantic diversity compared with pretrained models without finetuning. In this study, we propose a method to develop a model that simultaneously enhances both form and semantic diversity of output. Our approach involves merging the lower layers of an RLHF-trained model with the higher layers of its pre-RLHF counterpart to create a new model. This method is based on the hypothesis that the lower layers of a language model control form diversity while the higher layers govern semantic diversity. Experimental results demonstrate that models obtained by our method successfully improved both form and semantic diversity without significantly compromising output quality. Furthermore, we compared our method with temperature sampling, a widely used method, showing that our method is effective for both output quality and diversity.
