The 40th Annual Conference of the Japanese Society for Artificial Intelligence, 2026

Presentation Information

2:15 PM - 2:30 PM JST(5:15 AM - 5:30 AM UTC)

[5G3-OS-37b-02]RDFS-LLM-Bench: A Multi-Level Benchmark for Evaluating RDF Schema Inference in Large Language Models

〇Taichi Hosokawa¹, Sudesna Chakraborty¹, Takeshi Morita¹ (1. Aoyama Gakuin University)

Keywords:

RDF Schema entailment rules,Counterfactual knowledge,Large language models,Ontology,Logical inference capability

Large language models (LLMs) have achieved strong performance in various tasks, but they struggle with logical inference and often rely on pre-trained knowledge for reasoning. Furthermore, their inference capabilities in ontology languages remain underexplored. We propose a benchmark for assessing LLMs' logical inference abilities using RDF Schema entailment rules. This benchmark comprises three dataset types: real-world knowledge data based on Linked Open Data, counterfactual knowledge data, and random symbolic data, along with a multi-level evaluation framework that measures rule selection and application capabilities. We conducted evaluation experiments using commercial and open-weight LLMs. The experiments reveal that LLMs demonstrate high performance on random symbolic data, yet they rely on linguistic cues such as naming conventions in data with semantic vocabulary, but fail to separate structural inference from inference based on these cues on particularly challenging counterfactual knowledge data, leading to performance decline. Additionally, we observed a tendency to supplement information with pre-trained knowledge, which may be advantageous in real-world environments where data is incomplete. These findings demonstrate the necessity of considering both the potentials and limitations when leveraging LLMs in Semantic Web applications.

Comment

To browse or post comments, you must log in.Log in

Back to Session information