Presentation Information
[22a-52A-5]Benchmark for LLM in Materials Science and the evaluation of ChatGPT and Bard
〇Michiko Yoshitake1, Yuta Suzuki2, Ryo Igarashi1, Yoshitaka Ushiku1, Keisuke Nagato3 (1.OSX, 2.Osaka Univ., 3.Univ. Tokyo)
Keywords:
natural language model,materials science,model evaluation
We produced a benchmark data set in materials science for large language models. The benchmark data set is constructied by question-answer problems based on materials science textbooks at university-level. The results of evaluating LLMs, ChatGPT3.5, ChatGPT4 and Bard using this benchmark data set will be presented.