Presentation Information

[22a-52A-5]Benchmark for LLM in Materials Science and the evaluation of ChatGPT and Bard

〇Michiko Yoshitake1, Yuta Suzuki2, Ryo Igarashi1, Yoshitaka Ushiku1, Keisuke Nagato3 (1.OSX, 2.Osaka Univ., 3.Univ. Tokyo)

Keywords:

natural language model,materials science,model evaluation

We produced a benchmark data set in materials science for large language models. The benchmark data set is constructied by question-answer problems based on materials science textbooks at university-level. The results of evaluating LLMs, ChatGPT3.5, ChatGPT4 and Bard using this benchmark data set will be presented.