Presentation Information
[5Yin-A-60]Revisiting the Role of Attention Weighting in Transformer-based Sequential Recommender Model
〇Keito Kozaki1, Keigo Sakurai2, Ren Togo3, Takahiro Ogawa3, Miki Haseyama3 (1. School of Engineering, Hokkaido University, 2. Data-Driven Interdisciplinary Research Emergence Department, Institute for Integrated Innovations, Hokkaido University, 3. Faculty of Information Science and Technology, Hokkaido University)
Keywords:
Recommendation System,Sequential Recommendation,Transformer
Causal self-attention models such as SASRec are widely used in sequential recommendation, where learned non-uniform attention weights are often considered essential for high recommendation performance.
However, it remains unclear whether such weighting is functionally necessary.
This study addresses three research questions: (RQ1) Is learning non-uniform attention weighting truly required for strong performance? (RQ2) To what extent do trained models integrate cross-position information at the representation level? (RQ3) How are integration strength and functional reliance related across datasets?
To answer RQ1, we introduce a controlled uniform-attention intervention that removes input-dependent weighting while preserving architecture and training objectives.
For RQ2, we apply a norm-based decomposition to quantify self-preserving and cross-position components within attention blocks.
For RQ3, we jointly analyze mixing strength and performance sensitivity, revealing three dataset-dependent regimes.
These results suggest that positional integration and functional reliance on learned non-uniform weighting are separable and dataset-dependent.
However, it remains unclear whether such weighting is functionally necessary.
This study addresses three research questions: (RQ1) Is learning non-uniform attention weighting truly required for strong performance? (RQ2) To what extent do trained models integrate cross-position information at the representation level? (RQ3) How are integration strength and functional reliance related across datasets?
To answer RQ1, we introduce a controlled uniform-attention intervention that removes input-dependent weighting while preserving architecture and training objectives.
For RQ2, we apply a norm-based decomposition to quantify self-preserving and cross-position components within attention blocks.
For RQ3, we jointly analyze mixing strength and performance sensitivity, revealing three dataset-dependent regimes.
These results suggest that positional integration and functional reliance on learned non-uniform weighting are separable and dataset-dependent.
