Loading...

A system-level framework for analytical and empirical reliability exploration of STT-MRAM caches

Cheshmikhani, E ; Sharif University of Technology | 2020

463 Viewed
  1. Type of Document: Article
  2. DOI: 10.1109/TR.2019.2923258
  3. Publisher: Institute of Electrical and Electronics Engineers Inc , 2020
  4. Abstract:
  5. Spin-transfer torque magnetic RAM (STT-MRAM) is known as the most promising replacement for static random access memory (SRAM) technology in large last-level cache memories (LLC). Despite its high density, nonvolatility, near-zero leakage power, and immunity to radiation as the major advantages, STT-MRAM-based cache memory suffers from high error rates mainly due to retention failure (RF), read disturbance, and write failure. Existing studies are limited to estimate the rate of only one or two of these error types for STT-MRAM cache. However, the overall vulnerability of STT-MRAM caches, whose estimation is a must to design cost-efficient reliable caches, has not been studied previously. In this paper, we propose a system-level framework for reliability exploration and characterization of errors' behavior in STT-MRAM caches. To this end, we formulate the cache vulnerability considering the intercorrelation of the error types including RF, read disturbance, and write failure as well as the dependency of error rates to workloads' behavior and process variations (PVs). Our analysis reveals that STT-MRAM cache vulnerability is highly workload-dependent and varies by orders of magnitude in different cache access patterns. Our analytical study also shows that this vulnerability divergence significantly increases by PVs in STT-MRAM cells. To take the effects of system workloads and PVs into account, we implement the error types in gem5 full-system simulator. The experimental results using a comprehensive set of multiprogrammed workloads from SPEC CPU2006 benchmark suite on a quad-core processor show that the total error rate in a shared STT-MRAM LLC varies by 32.0× for different workloads. A further 6.5× vulnerability variation is observed when considering PVs in the STT-MRAM cells. In addition, the contribution of each error type in total LLC vulnerability highly varies in different cache access patterns and moreover, error rates are differently affected by PVs. The proposed analytical and empirical studies can significantly help system architects for efficient utilization of error mitigation techniques and designing highly reliable and low-cost STT-MRAM LLCs. © 1963-2012 IEEE
  6. Keywords:
  7. Cache memory ; Error rate ; Process variations (PVs) ; Read disturbance ; Retention failure (RF) ; Spin transfer torque magnetic RAM (STT-MRAM) ; Write failure ; Block codes ; Buffer storage ; Cache memory ; Error analysis ; Integrated circuit design ; Magnetic recording ; Magnetic storage ; Multiprogramming ; Outages ; Static random access storage ; Error rate ; Magnetic rams ; Process Variation ; read disturbance ; retention failure (RF) ; MRAM devices
  8. Source: IEEE Transactions on Reliability ; Volume 69, Issue 2 , 2020 , Pages 594-610
  9. URL: https://ieeexplore.ieee.org/document/8759069