STAIR: high reliable STT-MRAM aware multi-level I/O cache architecture by adaptive ECC allocation

Hadizadeh, M ; Sharif University of Technology | 2020

540 Viewed
  1. Type of Document: Article
  2. DOI: 10.23919/DATE48585.2020.9116550
  3. Publisher: Institute of Electrical and Electronics Engineers Inc , 2020
  4. Abstract:
  5. Hybrid Multi-Level Cache Architectures (HCAs) are promising solutions for the growing need of high-performance and cost-efficient data storage systems. HCAs employ a high endurable memory as the first-level cache and a Solid-State Drive (SSD) as the second-level cache. Spin-Transfer Torque Magnetic RAM (STT-MRAM) is one of the most promising candidates for the first-level cache of HCAs because of its high endurance and DRAM-comparable performance along with non-volatility. However, STT-MRAM faces with three major reliability challenges named Read Disturbance, Write Failure, and Retention Failure. To provide a reliable HCA, the reliability challenges of STT-MRAM should be carefully addressed. To this end, this paper first makes a careful distinction between clean and dirty pages to classify and prioritize their different vulnerabilities. Then, we investigate the distribution of more vulnerable pages in the first-level cache of HCAs over 17 storage workloads. Our observations show that the protection overhead can be significantly reduced by adjusting the protection level of data pages based on their vulnerability. To this aim, we propose a STT-MRAM Aware Multi-Level I/O Cache Architecture (STAIR) to improve HCA reliability by dynamically generating extra strong Error- Correction Codes (ECCs) for the dirty data pages. STAIR adaptively allocates under-utilized parts of the first-level cache to store these extra ECCs. Our evaluations show that STAIR decreases the data loss probability by five orders of magnitude, on average, with negligible performance overhead (0.12% hit ratio reduction in the worst case) and 1.56% memory overhead for the cache controller. © 2020 EDAA
  6. Keywords:
  7. Error-Correction Code (ECC) ; Hybrid Multi-Level Cache Architecture ; STT-MRAM ; Cache memory ; Cost reduction ; Dynamic random access storage ; Error correction ; Magnetic recording ; Magnetic storage ; Memory architecture ; Reliability ; Stairs ; Cache architecture ; Data storage systems ; Error correction codes (ECCs) ; Multi-level cache architecture ; Orders of magnitude ; Protection level ; Solid state drives (SSD) ; Spin transfer torque ; MRAM devices
  8. Source: 2020 Design, Automation and Test in Europe Conference and Exhibition, DATE 2020, 9 March 2020 through 13 March 2020 ; 2020 , Pages 1484-1489
  9. URL: https://ieeexplore.ieee.org/document/9116550