Loading...

Proposing a Scalable and Energy-aware Architecture for Register File of GPUs

Sadrosadati, Mohammad | 2019

505 Viewed
  1. Type of Document: Ph.D. Dissertation
  2. Language: Farsi
  3. Document No: 52398 (19)
  4. University: Sharif University of Technology
  5. Department: Computer Engineering
  6. Advisor(s): Sarbazi-Azad, Hamid
  7. Abstract:
  8. Graphics Processing Units (GPUs) employ large register files to accommodate all active threads and accelerate context switching. Unfortunately, register files are a scalability bottleneck for future GPUs due to long access latency, high power consumption, and large silicon area provisioning. In this thesis, we propose the Latency-Tolerant Register File (LTRF) architecture to achieve low latency in a two-level hierarchical structure. We observe that compile-time interval analysis enables us to divide GPU program execution into intervals with an accurate estimate of a warp’s aggregate register working-set within each interval. The key idea of LTRF is to prefetch the estimated register working-set from the main register file to the register file cache under software control, at the beginning of each interval, and overlap the prefetch latency with the execution of other warps. LTRF enables high-capacity yet long-latency main GPU register files, paving the way for various optimizations. We use LTRF architecture to implement two power/area optimizations in the main register file. First, we implement a powerful data compression algorithm to compress the register file. Second, we implement the main register file with emerging high-density high-latency memory technologies, enabling 8X larger capacity to improve the thread level parallelism. To benefit better from higher thread level parallelism, we improve GPU memory system throughput focusing on the network-on-chip connecting streaming multiprocessors to L2 cache banks. We also observe that execution units experience significant idleness during the execution time. Therefore, we propose idle-time-aware power management technique in order to save the static energy of remainder idleness in the execution units
  9. Keywords:
  10. Graphic Processing ; Thread Level Parallelism ; Register File ; Cache Memory ; Prefetching

 Digital Object List

 Bookmark

No TOC