Loading...

Data-Aware compression of neural networks

Falahati, H ; Sharif University of Technology | 2021

232 Viewed
  1. Type of Document: Article
  2. DOI: 10.1109/LCA.2021.3096191
  3. Publisher: Institute of Electrical and Electronics Engineers Inc , 2021
  4. Abstract:
  5. Deep Neural networks (DNNs) are getting deeper and larger which intensify the data movement and compute demands. Prior work focuses on reducing data movements and computation through exploiting sparsity and similarity. However, none of them exploit input similarity and only focus on sparsity and weight similarity. Synergistically analysing the similarity and sparsity of inputs and weights, we show that memory accesses and computations can be reduced by 5.7× and 4.1×, more than what can be decreased by exploiting only sparsity, and 3.9× and 2.1×, more than what can be decreased by exploiting only weight similarity. We propose a new data-aware compression approach, called DANA, to effectively utilize both sparsity and similarity in inputs and weights. DANA can be orthogonally implemented on top of different hardware DNN accelerators. As an example, we implement DANA on top of an Eyeriss-like architecture. Our results over four famous DNNs reveal that DANA outperforms Eyeriss in terms of average performance and energy consumption by 18× and 83×, respectively. Moreover, DANA is faster than the state-of-the-art sparsity-aware and similarity-aware techniques by respectively 4.6× and 4.5×, and reduces the average energy consumption over them by 3.0× and 5.8×. © 2002-2011 IEEE
  6. Keywords:
  7. Data compression ; Deep neural networks ; Energy utilization ; Neural networks ; Average energy ; Compression approach ; Data movements ; Memory access ; State of the art ; Data reduction
  8. Source: IEEE Computer Architecture Letters ; Volume 20, Issue 2 , 2021 , Pages 94-97 ; 15566056 (ISSN)
  9. URL: https://ieeexplore.ieee.org/abstract/document/9483693