A Fast and Scalable Network-on-Chip for DNN Accelerators

Tahmasebi, Faraz; Sarbazi Azad, Hamid

Please enable javascript in your browser.

A Fast and Scalable Network-on-Chip for DNN Accelerators

Tahmasebi, Faraz | 2022

152 Viewed

Type of Document: M.Sc. Thesis
Language: Farsi
Document No: 55015 (19)
University: Sharif University of Technology
Department: Computer Engineering
Advisor(s): Sarbazi Azad, Hamid
Abstract:
Deep Neural Networks (DNNs) are widely used as a promising machine learning method in different applications and come with intensive computation and storage requirements. In recent years, several pieces of prior work have proposed different accelerators to improve DNNs processing. We observe that although the state-of-the-art DNN accelerators effectively process some network layers of certain shapes, they fail to keep computation resources fully utilized for many other layers. The reason is twofold: first, the mapping algorithm is unable to employ all compute cores for processing some layer types and dimension sizes, and second, the hardware cannot perform data distribution and aggregation for the employed cores, without stalling them.This paper proposes FHAI, a flexible accelerator for DNNs inference processing, which can highly utilize computation resources in different types and dimensions of layers. FHAI possesses two important features: (1) It uses a new flexible dataflow, called Row Streaming, which considers the available compute resources and characteristics of each layer, and determines the partitioning, mapping, and data movement strategies to maximize resource utilization, while exploiting data reuse, and (2) employs an architecture with a flexible interconnection network, which can reconfigure the datapath according to the dataflow's mapping. We introduce a novel Spatio-Temporal reduction network, called STAT, which efficiently exploits all adder units during the reduction process, for any mapping strategy. Our proposal also supports sparse DNNs processing by eliminating all zeros from both weights and inputs and prevents cores from doing ineffective multiply-and-accumulate (MAC) operations. Fairly compared to Eyeriss-V2, a state-of-the-art DNN accelerator, FHAI achieves 4× and 3.9× higher throughput for dense and sparse DNNs, respectively. It is also more than 8× better in area-performance efficiency in both dense and sparse models
Keywords:
Deep Neural Networks ; Data Flow ; Flexibility ; Scalability ; Utilization ; Network-on-Chip (NOC) ; Deep Neural Network (DNN)Accelerator

Digital Object List

محتواي کتاب
view

Bookmark

No TOC

Friend's email
Your name
Your email
enter code