Presentation Information

[2Yin-B-13]Statistical-Physics Characterization of Long-Context Processing in Language Models

〇Kai Nakaishi1, Yui Oka2, Yuji Yamamoto1, Kyosuke Nishida2, Sho Yokoi1 (1. NINJAL, 2. NTT)

Keywords:

Large Language Models,Long-Context Processing,Length Extrapolation,Statistical Physics

We provide a statistical-physics characterization of the long-context processing abilities of large language models, including the ability to deal with texts longer than the maximum sequence length seen during training. Specifically, we propose that the ability to appropriately reference arbitrarily distant preceding tokens, even in very long contexts, can be characterized by a power-law decay of two-point correlation with respect to distance in the generated text. We also argue that the ability to suppress repetitions of identical expressions can be characterized by the absence of a dominant period when the generated text is decomposed into waves of different periods. Furthermore, we experimentally examine these claims through statistical analyses of generated texts.