Presentation Information

[5Yin-A-30]Evaluation of Distributional Matching of Substructures in Language-Model-Based Graph Generation

〇Masatsugu Yamada1, Mahito Sugiyama1 (1. National Institute of Informatics)

Keywords:

Graph Generation,Frequent Subgraph Mining,Large Language Model

To understand the generalization performance of graph generative models, we distinguish memorization of training data from preservation of local statistical properties. We investigate whether Transformer models trained on canonical DFS codes reproduce the distribution of frequent subgraphs. We extract frequent subgraphs from training and generated sets and represent distributions of patterns and their support. We analyze reproducibility and novelty using rank correlation, JSD, missing mass, and novel mass, and compare with reference baselines via resampling. Experimental results show high consistency in high-frequency regions: local substructure distributions exhibit mining-like statistical reproduction. With appropriate decoding, the model can reproduce training graphs, while novel outputs still align with the training distribution of high-frequency substructures. However, this consistency is matched by resampling baselines. In contrast, low-frequency regions show significant missing mass and distributional divergence. Decoding constraints amplify memorization-like behavior by forcing selection toward high-probability sequences.