Presentation Information
[22a-52A-6]Evaluation and Improvement of Large-Scale Language Models for Retrieval of Magnetic Material Synthesis Conditions
〇Luca Foppiano1, Guillaume Lambard1, Masashi Ishii1 (1.Data-driven Materials Design Group, CBRM, NIMS)
Keywords:
large language models,retrieval augmented generation,text mining
In this presentation, we present our method for evaluating the embedding function underlying the semantic similarity calculation and how it can improve the performance of a Retrieval Augmented Generation (RAG) system.
We first collect a dataset of combined questions and corresponding answers extracted from scientific papers. Subsequently, we assess the performance of standard embedding functions. Following this evaluation, we fine-tuned the embeddings using a subset of our collected dataset.
Finally, we evaluate the end-to-end improvements resulting from the fine-tuning process to extract sample synthesis descriptions from a corpus of documents in the permanent magnets literature.
We first collect a dataset of combined questions and corresponding answers extracted from scientific papers. Subsequently, we assess the performance of standard embedding functions. Following this evaluation, we fine-tuned the embeddings using a subset of our collected dataset.
Finally, we evaluate the end-to-end improvements resulting from the fine-tuning process to extract sample synthesis descriptions from a corpus of documents in the permanent magnets literature.