Loading...

An Intelligent L2 Management Method in GPUs

Javadinezhad, Ahmad | 2023

88 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 56280 (19)
  4. University: Sharif University of Technology
  5. Department: Computer Engineering
  6. Advisor(s): Sarbazi Azad, Hamid
  7. Abstract:
  8. To capture on-chip memory locality, tolerate off-chip memory latency, and expeditiously process memory-bound GPGPU applications, Graphics Processing Units (GPUs) introduce a local L1D cache and a shared L2 cache within and between streaming multiprocessors (SMs), respectively. The L2 cache solves the problem of data coherency and sharing between SMs (unlike the L1D cache). Prior work shows that loading all data into the L2 cache without a proper mechanism to manage the input data rate, poses some challenges (e.g., cache contention/trashing, increased write-back traffic, and bandwidth inefficiency) and ultimately puts a lot of pressure on off-chip memory. In this paper, we make the observation that a significant amount (around 66%) of the data which encounters a hit in the L1D cache, will be used by other SMs in the near future. To avoid unnecessary loads into the L2 cache, we propose a new L2 cache management scheme (called Lazy-LLC), which relies on the L1D caches as L2-compatible data filters (i.e., the L1D cache likely filters shared data at the GPU level). This mechanism reduces the L2 cache improves bandwidth efficiency by 19% for memory-bound applications, simply by filling the L2 cache with data from recent L1D cache victims that have at least one L1D hit before. Furthermore, Lazy-LLC enhances the performance of the baseline memory-bound applications by an average of 22%. Moreover, Lazy-LLC’s cache miss rate reduced by 26%
  9. Keywords:
  10. Cache Memory ; Graphic Processing ; Locality ; Sharing ; L2 Cache Memory ; Off-Chip Memory

 Digital Object List

 Bookmark