講演情報

[24p-31A-13]Analysis of Low-Bit Precision ReRAM CiM-based Convolutional Neural Networks during Training and Inference

〇(D)Adil Padiyal1, Ayumu Yamada1, Naoko Misawa1, Chihiro Matsui1, Ken Takeuchi1 (1.The Univ. Of Tokyo)

キーワード:

Computation in Memory

This paper proposes a method to improve the performance of neural networks with computation-in-memory capabilities. The proposed techniques speed up training and inference while using less energy and memory by partially quantizing the neural network's learning and inference processes. Based on the accuracy of the inference, the effect of quantization incurred by the use of computation-in-memory is assessed. The impact of non-idealities resulting from the use of NVM memories, like ReRAM, on network accuracy has been recorded and reported. The findings show that to maintain an acceptable level of inference accuracy, gradients, input/output data, and weights must all meet a specific quantization bit precision threshold. The experiments show that, in comparison to the neural network trained without the use of computation-in-memory, there is a slight decrease in inference accuracy of about 2.8%. This accuracy trade-off is accompanied by a significant improvement in memory footprint, with memory usage reductions of 62% and 93% during the training and inference phases, respectively.