Presentation Information
[5G3-OS-37b-02]RDFS-LLM-Bench: A Multi-Level Benchmark for Evaluating RDF Schema Inference in Large Language Models
〇Taichi Hosokawa1, Sudesna Chakraborty1, Takeshi Morita1 (1. Aoyama Gakuin University)
Keywords:
RDF Schema entailment rules,Counterfactual knowledge,Large language models,Ontology,Logical inference capability
Large language models (LLMs) have achieved strong performance in various tasks, but they struggle with logical inference and often rely on pre-trained knowledge for reasoning. Furthermore, their inference capabilities in ontology languages remain underexplored. We propose a benchmark for assessing LLMs' logical inference abilities using RDF Schema entailment rules. This benchmark comprises three dataset types: real-world knowledge data based on Linked Open Data, counterfactual knowledge data, and random symbolic data, along with a multi-level evaluation framework that measures rule selection and application capabilities. We conducted evaluation experiments using commercial and open-weight LLMs. The experiments reveal that LLMs demonstrate high performance on random symbolic data, yet they rely on linguistic cues such as naming conventions in data with semantic vocabulary, but fail to separate structural inference from inference based on these cues on particularly challenging counterfactual knowledge data, leading to performance decline. Additionally, we observed a tendency to supplement information with pre-trained knowledge, which may be advantageous in real-world environments where data is incomplete. These findings demonstrate the necessity of considering both the potentials and limitations when leveraging LLMs in Semantic Web applications.
Comment
To browse or post comments, you must log in.Log in
