Loading...

Coordinated DVFS and Precision Control for Deep Neural Networks

Nabavinejad, S. M ; Sharif University of Technology | 2019

403 Viewed
  1. Type of Document: Article
  2. DOI: 10.1109/LCA.2019.2942020
  3. Publisher: Institute of Electrical and Electronics Engineers Inc , 2019
  4. Abstract:
  5. Traditionally, DVFS has been the main mechanism to trade-off performance and power. We observe that Deep Neural Network (DNN) applications offer the possibility to trade-off performance, power, and accuracy using both DVFS and numerical precision levels. Our proposed approach, Power-Inference accuracy Trading (PIT), monitors the server's load, and accordingly adjusts the precision of the DNN model and the DVFS setting of GPU to trade-off the accuracy and power consumption with response time. At high loads and tight request arrivals, PIT leverages INT8-precision instructions of GPU to dynamically change the precision of deployed DNN models and boosts GPU frequency to execute the requests faster at the expense of accuracy reduction and high power consumption. However, when the requests' arrival rate is relaxed and there is slack time for requests, PIT deploys high precision version of models to improve the accuracy and reduces GPU frequency to decrease power consumption. We implement and deploy PIT on a state-of-the-art server equipped with a Tesla P40 GPU. Experimental results demonstrate that depending on the load, PIT can improve response time up to 11 percent compared to a job scheduler that uses only FP32 precision. It also improves the energy consumption by up to 28 percent, while achieving around 99.5 percent accuracy of sole FP32-precision
  6. Keywords:
  7. Accuracy ; Deep neural network ; Hardware accelerator ; Power ; Response time ; Computer graphics ; Computer hardware ; Economic and social effects ; Electric power utilization ; Energy utilization ; Graphics processing unit ; Green computing ; Neural networks ; Program processors ; Response time (computer systems) ; Servers ; Hardware accelerators ; Power demands ; Runtimes ; Time factors ; Time frequency analysis ; Deep neural networks
  8. Source: IEEE Computer Architecture Letters ; Volume 18, Issue 2 , 2019 , Pages 136-140 ; 15566056 (ISSN)
  9. URL: https://ieeexplore.ieee.org/document/8840877