Presentation Information
[4Yin-B-07]Knowledge Graph Embedding Search for MONDO-Centered Disease-to-Mouse Model RetrievalA node2vec Baseline Study
〇Tatsuya Kushida1, Daiki Usuda1, Norio Kobayashi1, Hiroshi Masuya1 (1. RIKEN)
Keywords:
knowledge graph,embedding-based search,node2vec,disease ontology,mouse model exploration
In animal model selection for disease research, identifying appropriate mouse strains for a disease concept and clarifying the rationale for their validity require substantial time and effort. We present a MONDO-centered disease-to-mouse retrieval system that ranks candidate RIKEN BRC mouse strains from an integrated knowledge graph and attaches evidence for interpretability. We built a knowledge graph (~950k triples) integrating genes, mouse strains, HPO, and MP, and derived an undirected graph with 922,647 edges for node2vec training. Using node2vec embeddings (73,217 nodes; 128 dimensions), we ranked mouse strains by cosine similarity between the query disease node and mouse nodes, and attached supporting evidence as direct links (1-hop) and 2-hop paths on the integrated edge set. For out-of-vocabulary queries, we implemented a fallback mechanism based on MONDO hierarchy expansion (ancestor/descendant). We also implemented a Streamlit interface (under development) to execute retrieval and compare three embedding methods (node2vec, TransE, RotatE), including strain-type distributions and candidate overlap. Known disease models are used only as auxiliary reference for sanity checks rather than a definitive gold standard. Future work includes metapath design and broader comparison with additional knowledge graph embedding models.
