Loading...

Unifying L1 Data Cache and Shared Memory in GPUs

Yousefzadeh-Asl-Miandoab, Ehsan | 2018

1832 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 51241 (19)
  4. University: Sharif University of Technology
  5. Department: Computer Engineering
  6. Advisor(s): Sarbazi Azad, Hamid
  7. Abstract:
  8. Graphics Processing Units (GPUs) employ a scratch-pad memory (a.k.a., shared memory) in each streaming multiprocessor to accelerate data sharing among the threads in a thread block and provide a software-managed cache for the programmers.However, we observe that about 60% of GPU workloads of several well-known benchmark suites do not use shared memory. Morever, among those workloads that use shared memory, about 42% of shared memory is not utilized, on average. On the other hand, we observe that many general purpose GPU applications suffer from the low hit rate and limited bandwidth of L1 data cache.We aim to use shared memory space and its corrsponding bandwidth for improving L1 data cache, while the shared memory is not utilized. Our key idea is to (1) map shared memory address space to off-chip memory, (2) use a unified L1 data cache for shared memory, global and local address spaces. To improve the hit rate of the cache for shared memory accesses, we attempt to keep each shared memory address in the cache throughout its lifetime. We observe that most of shared memory addresses have only one read after their first write.Therefore, we lock each shared memory address in the cache after its first write and unlock it after its first read. Our experimental results show an average 38% IPC improvement compared to the baseline architecture
  9. Keywords:
  10. Shared Memory ; Cache Memory ; General Purpose Graphic Processing Units (GPGPU) ; Graphic Processing ; Scratch Pad Memory (SPM) ; Reconfiguration

 Digital Object List

 Bookmark

...see more