Presentation Information

[D-8-04]Cross-Model Comparison of SAE Feature Contributions to Output Probabilities

〇Yuta Yuzurihara1, Tomofumi Matsuzawa1, Kaiyu Suzuki1 (1. TUS)

Keywords:

Mechanistic Interpretability,LLM,Sparse Autoencoders