Loading...

Design and Evaluation of a Reconfigurable Accelerator for Sparse Neural Networks

Dadashi, Fardin | 2022

40 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 56040 (19)
  4. University: Sharif University of Technology
  5. Department: Computer Engineering
  6. Advisor(s): Sarbazi Azad, Hamid
  7. Abstract:
  8. Deep Neural Networks (DNNs) are widly used in various domains, such as medicine, engineering, industry, financial markets, mathematics and management. DNNs are composed of several layers, such as convolutional and fully connected layers. Increasing the number of layers in DNNs provides different application with their required accuracy. In recent years, there have been many accelerator aiming to execute DNNs. However, the high computation and memory demands in DNNs are the main challenges to execute DNNs. To reduce computation and memory requirements, various methods such as pruning and quantization, have been proposed. Pruning and quantization methods make the DNNs sparse and increase the probability of zero and repeated data. However, these methods make the execution of DNNs irreugular. There is a wealth of researches which aim to exploit sparsity and repeatition in data while decreasing the overheads caused by irregularity. Although factorization of non–zero unique weights has an effective impact in reducing the computation of DNNs, we observed that it distributes the inputs over different weights unbalanced. As a result, it greatly reduces the resource utilization of the accelerator. In this research, we intend to present a reconfigurable accelerator for executing sparse DNNs. We only focus on exploiting sparsity in weights and inputs and repetition in weights. To do so, we present several reconfigurable approaches which leverage different parallel adder structures, namely Simple Tree Adder, Multi–Output Tree Adder, and Chain Adder. The proposed designs execute DNNs in two modes: (1) independent, where each processing element produces an output and we do not need to exchange partial results and (2) dependent, where several processing elements are responsible for calculating an output. We evaluate the proposed accelerators for four DNNs and compare them over Eyeriss and UCNN in terms of performance, consumed energy, and resource utilization. The first method, Simple Tree Adder, achieves higher utilization by 75% and is faster by 12x (up to 14x) compared to non–reconfigurable method (Eyeriss). The second method, Muti–output Adder, achieves utilization by 95%. The third method, Chain Adder, achieves utilization by 95%. and is faster by 15x (up to 17x) compared to Eyeriss
  9. Keywords:
  10. Deep Neural Networks ; Reconfiguration ; Quantization ; Pruning Method ; Utilization

 Digital Object List

 Bookmark

No TOC