講演情報
[14p-P06-2]A Design for Digital-CiM INT8 Transformer Accelerator with Pipeline
〇(M2)Yifan Wang1, Adil Padiyal1, Daqi Lin1, Shota Suzuki1, Chihiro Matsui1, Ken Takeuchi1 (1.Univ.Tokyo)
キーワード:
Transformer、Pipeline、Accelerator
Transformers have revolutionized fields like natural language processing (NLP) since their introduction. One of their basic characteristics is that there are a large number of matrix multiplication (MM) operations, which causes significant data movement and computation, leading to “Memory bottleneck”. Computing-in-memory has proven itself to be a potential solution for data-centric computation. However, the analog CiM suffers from precision loss and a large cost of DAC/ADC. Also, the attention mechanism of the transformer requires matrix multiplication in different sizes, causing huge desynchrony. In this work, a full-digital CiM transformer accelerator with a 2-stage pipeline is designed, aiming to solve two challenges: (1) the redundant off-chip memory access and on-chip memory cost; and (2) the pipeline bubble caused by the different computing characteristics.