The 40th Annual Conference of the Japanese Society for Artificial Intelligence, 2026

Presentation Information

[5Yin-A-30]Evaluation of Distributional Matching of Substructures in Language-Model-Based Graph Generation

〇Masatsugu Yamada¹, Mahito Sugiyama¹ (1. National Institute of Informatics)

Keywords:

Graph Generation,Frequent Subgraph Mining,Large Language Model

To understand the generalization performance of graph generative models, we distinguish memorization of training data from preservation of local statistical properties. We investigate whether Transformer models trained on canonical DFS codes reproduce the distribution of frequent subgraphs. We extract frequent subgraphs from training and generated sets and represent distributions of patterns and their support. We analyze reproducibility and novelty using rank correlation, JSD, missing mass, and novel mass, and compare with reference baselines via resampling. Experimental results show high consistency in high-frequency regions: local substructure distributions exhibit mining-like statistical reproduction. With appropriate decoding, the model can reproduce training graphs, while novel outputs still align with the training distribution of high-frequency substructures. However, this consistency is matched by resampling baselines. In contrast, low-frequency regions show significant missing mass and distributional divergence. Decoding constraints amplify memorization-like behavior by forcing selection toward high-probability sequences.

Comment

To browse or post comments, you must log in.Log in

Back to Session information