Loading...

Improving the Efficiency of On-chip 3D Stacked DRAM in Server Processors

Samandi, Farid | 2017

666 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 49998 (19)
  4. University: Sharif University of Technology
  5. Department: Computer Engineering
  6. Advisor(s): Sarbazi-Azad, Hamid; Lotfi Kamran, Pejman
  7. Abstract:
  8. Big-data server workloads have vast datasets, and hence, frequently access off-chip memory for data. Consequently, server workloads lose significant performance potential due to off-chip latency and bandwidth walls. Recent research advocates using 3D stacked DRAM to break the walls. As 3D stacked DRAM cannot accommodate the whole datasets of server workloads, most proposals use 3D DRAM as a large cache. Unfortunately, a large DRAM cache imposes latency overhead due to (1) the need for tag lookup and (2) inefficient utilization of on-chip and off-chip bandwidth, and as a result, lowers the benefits of 3D stacked DRAM. Moreover, storing the tags of a multi-gigabyte DRAM cache requires changes in the DRAM organization for the tags to be stored in the DRAM itself. In this work, we make the case for using multi-gigabyte on-chip DRAM as part of the main memory. While using the whole 3D DRAM as part of the main memory is suboptimal to a large cache due to capacity limitations, we observe that only a small set of transient data accesses are responsible for the gap between the memory and the cache organizations. We show that if a small cache is placed next to the on-chip DRAM, the memory realization of the on-chip DRAM becomes superior to the cache realization. Moreover, we show that size of the cache required for the memory realization to be practical does not increase as the size of the on-chip DRAM increases. Based on these observations, we propose an organization for the on-chip DRAM that consists of a small cache (32 MB) along with a large on-chip main memory to complement the off-chip memory. As most of the on-chip DRAM is used as part of the main memory, the problems associated with both looking up and storing large tag arrays are eliminated. Moreover, the cache part of the stacked DRAM will offer low access latency because we can afford having SRAM tags due to the limited size of the required tag. Using full-system simulation, we show that our proposal improves system performance by 11% over the state-of-the-art DRAM cache organization
  9. Keywords:
  10. Memory Management ; On-Chip Memories ; Die-Stacked Dynamic Random Access Memory (DRAM) ; Server Processors ; Hybrid Memory

 Digital Object List

 Bookmark

No TOC