Presentation Information
[1Yin-A-13]Construction of an Integrated Graph across Multiple Bibliographic Databases as a Foundation for Identifying Japanese Reference Strings
〇Katsuyuki Hirai1, Teruhito Kanazawa2, Takahiro Hayashi3 (1. Niigata University of Health and Welfare, 2. National Institute of Informatics, 3. Kansai University)
Keywords:
Bibliographic Data,Record Linkage,Citation Index
Bibliographic identification of reference strings is an essential process for constructing citation index databases. However, a significant challenge arises when identical works are not properly disambiguated within or across databases. In such cases, the same entity may be treated as distinct documents, hindering the construction of accurate citation relationships. This problem is particularly pronounced for Japanese-language literature, where progress in work-level identification has been limited. In this study, we integrated records from the Japan National Bibliography and NACSIS-CAT for books, and from the Japanese Periodicals Index and JaLC for journal articles, into a graph database (Neo4j). The integration was performed by matching identifiers or title-author combinations in the metadata. Analysis of the connected components in the resulting graph confirmed the formation of useful clusters at the work level. However, it also revealed a challenge where distinct works were over-connected due to shared series titles (e.g., complete works) or common titles. This presentation describes the methodology for constructing this large-scale bibliographic graph and reports on its structural characteristics, including the size and properties of the resulting connected components.
