講演情報

[22a-52A-6]Evaluation and Improvement of Large-Scale Language Models for Retrieval of Magnetic Material Synthesis Conditions

〇Luca Foppiano1, Guillaume Lambard1, Masashi Ishii1 (1.Data-driven Materials Design Group, CBRM, NIMS)

キーワード:

large language models、retrieval augmented generation、text mining

In this presentation, we present our method for evaluating the embedding function underlying the semantic similarity calculation and how it can improve the performance of a Retrieval Augmented Generation (RAG) system.
We first collect a dataset of combined questions and corresponding answers extracted from scientific papers. Subsequently, we assess the performance of standard embedding functions. Following this evaluation, we fine-tuned the embeddings using a subset of our collected dataset.
Finally, we evaluate the end-to-end improvements resulting from the fine-tuning process to extract sample synthesis descriptions from a corpus of documents in the permanent magnets literature.