Loading...
Improving Reliability and Durability of Phase Change Main Memories
Asadinia, Marjan | 2016
1353
Viewed
- Type of Document: M.Sc. Thesis
- Language: English
- Document No: 48426 (52)
- University: Sharif University of Technology, International Campus, Kish Island
- Department: Science and Engineering
- Advisor(s): Sarbazi Azad, Hamid
- Abstract:
- Dynamic Random Access Memory (DRAM) has been the leading main memory technology during the last four decades. In deep sub-micron regime, however, scaling DRAM comes with several challenges caused by charge leakage and imprecise charge placement. Phase Change Memory (PCM) technology is known as one of the most promising technologies to replace DRAM. Compared to competitive non-volatile memories, PCM benefits from best attributes of fast random access, negligible leakage energy, superior scalability, high density, and operating in both Single-level Cell (SLC) and Multi-level Cell (MLC) storage levels without imposing large storage overhead. Unfortunately, density advantage of MLC PCM devices comes at the cost of lower write endurance that results in fast wear-out of memory cells. Other preliminary concerns for PCM applicability are related to higher latency and energy consumption as well as low resilience to soft errors because of resistance drift. In this line, recent studies have proposed redirection or correction schemes to alleviate this problem, but all suffer from poor throughput and latency. None of the techniques proposed in the literature to improve the lifetime and reliability of PCM memories consider the impressive characteristic of PCMs to easily shape-shifting from SLC to MLC storage level using some negligible overhead of read/write circuits. We exploit this remarkable ability to propose some new techniques to improve the lifetime of PCMs. In this thesis, firstly, we show that one of the inefficiency sources of current schemes, even when wear-leveling algorithms are used, is the non-uniformity of write endurance across different cells incurred by process variation. That is why when some memory pages have reached their endurance limit in the PCM memory, other pages may be far from their limit. We present On-Demand Page Paired PCM (OD3P), a technique that mitigates the problem of fast failure of some pages by redirecting them onto some other healthy pages. During recovery, the target PCM page is converted to MLC to keep data of the two pages. Compared to the state-of-the-art error correction scheme, OD3P can improve PCM time-to-failure and system performance (IPC) by 12% and 14%, respectively, under multi-threaded and multi-program workloads. We then extend the durability of PCM pages by enabling line pairing within a page (line-level OD3P) and word-level pairing within a line (Intra-line level pairing). Next, we show that beside the process variation, unbalanced write traffic makes the memory cells wear-out even sooner. Therefore, we propose a byte-level shifting scheme, BLESS, that uses a simple shift mechanism for balancing the write traffic and error recovery purposes using the MLC capability of PCM. Simulation results show that our proposed scheme is life-time effective when tested under a wide range of workloads compared to state-of-the-art line-level and page-level schemes. On average, BLESS can improve the lifetime by 14-25% over the state-of-the-art schemes. Our schemes mainly rely on MLC capability of PCMs to recover hard errors but MLC storage level is prone to drifts. To address this problem, we propose Variable Resistance Spectrum MLC PCM (VR-PCM), a simple micro-architectural technique with more efficient drift-aware MLC PCM access operations. Using full-system evaluation of an MLC PCM main memory with conservative resistance drift model, we show that VR-PCM tailored for high-density MLCs can deliver considerable improvements in performance (13.25%), energy (21.2%), and lifetime (1.77x), on average
- Keywords:
- Phase Change Memory ; Performance ; Energy ; Resistance Drift ; Life Time ; Reliability Improvment ; Single Level Cell (SLC)Memory ; Multilevel Cell (MLC)Memory ; Hard Errors
-
محتواي کتاب
- view
- List of Tables
- Chapter 1
- Introduction
- Figure 1.1: The number of published articles on PCM Technology taken from IEEE Xplore database for all IEEE, AIP, IET and IBM articles and journals.
- Chapter 2
- Phase Change Memory (PCM)
- Figure 2.1: (A) Typical R-T (normalized) curve of chalcogenide films shows that the resistivity of amorphous phase is 5-6 orders of magnitude higher than the polycrystalline phase. TR1R and TR2R are the temperatures where the transitions to both phase...
- Figure 2.2: In order to maintain reasonable programming currents, phase change memory devices are configured such that one of the critical dimensions of the current flow path is in the range of 10-50nm. In the mushroom cell (A), a thin film of chalco...
- Figure 2.3: Concept of Multi-Level PCM [14].
- Figure 2.4: Resistance partitioning of 2-bit MLC configured for about one minute drift tolerance at 300ºK.
- Figure 2.5: Example of integrating ADC: structure and working [47].
- Figure 2.6: Example of successive approximation ADC [48].
- Figure 2.7: Approach1: Increasing amorphous region (h1 <=> resistance R1, h2 <=> resistance R2, h2 > h1 => R2 > R1)
- Figure 2.8: Approach2: Increasing crystalline filaments (w1 <=> resistance R1, w2 <=> resistance R2, w2 > w1 => R2 < R1)
- Figure 2.9: P&V write process for MLC PCM [43].
- Figure 2.10: The programmed resistance states in PCM devices (A, units in KOhm) drifts upwards initially, and then decreases towards the crystalline state resistance. The rate of drift is higher for larger resistances. The programmed resistance increa...
- Chapter 3
- Related Work
- Figure 3.1: Typical access latency of various technologies in the memory hierarchy [67].
- Figure 3.2: Percentage of redundant bit-writes for single-level and multi-level PCM cells [69].
- Figure 3.3: Circuitry for Redundant Bit Removal and Row Shifting [69].
- Figure 3.4: PCM-based Hybrid Memory System [70].
- Figure 3.5: Write pausing in PCM: (A) Pause points (B) Servicing reads via Pausing [72].
- Figure 3.6: Start-Gap wear leveling on a memory containing 16 lines [74].
- Figure 3.7: Architecture for Randomized Start-Gap [74].
- Figure 3.8: Recent Wear leveling algorithms: (a) Region Based Start-Gap (b) Seznec’s PCM-S Scheme (c) Security Refresh [80].
- Figure 3.9: Physical Address Space with Dynamic Replication [84].
- Figure 3.10: Error Correction with ECP replacement of failed memory cells. (a) Correcting one bit error with ECP-1. (b) Correcting up to five errors with ECP-5. (c) Tolerating errors in replacement cells. (d) Tolerating errors in pointers of ECP [85].
- Figure 3.11: Example of partition two faults with SAFER [87].
- Figure 3.12: Example of fine-grained remapping with embedded pointer [88].
- Figure 3.13: An example for constructing the invertible set [107].
- Chapter 4
- Handling Hard Errors in PCM by Using Inter-line Level Schemes
- Figure 4.1: Write endurance of different 1KB pages in a 1GB PCM bank. The average cell’s endurance is 10P8P with standard deviation of 0.25 ×10P8P. The values are normalized to the weakest page (3.8 × 10P7P in this experiment). For considerable number...
- Figure 4.2: Structural view of the Target Page Selector (TPS) and Address Translation Table (ATT) units.
- Figure 4.3: (a) An example of how accesses to any of two PCM pages are performed with conventional stacked mapping. Upon receiving a read to location A (or B), the memory controller issues a read signal to sense the resistance value of each cell. Whe...
- (c) Interleaved bit mapping for the example of Figure 4.3(a).
- Figure 4.4: A simple example scenario of OD3P operation in a small PCM memory of 8 pages. (a) Fully-Selective OD3P Scheme, (b) Fixed OD3P Scheme, (c) Partially-Selective OD3P Scheme with Group Size of 4.
- Figure 4.5: Example for line level pairing where line i reaches its endurance limit and ECP-6 cannot protect it anymore. Thereafter, invoking the line level OD3P can select line j as the healthiest line (the one with maximum FreeECPs) and store them (...
- Figure 4.6: Comparison of the lifetime versus capacity provided by six evaluated systems with CoV of 0.25.
- Figure 4.7: Performance analysis of OD3P mechanisms compared to DRM+ECP-6 and ECP-6 baselines under different workloads. The values are normalized to ECP-6 baseline system.
- Figure 4.8: Speedup comparison for different group size (ranging from 4 to 64).
- Figure 4.9: Endurance analysis of OD3P mechanisms with different configurations and pairing algorithms compared to the baseline architecture. Again, the values are normalized to a ECP-6 baseline system.
- Figure 4.10: TPS accuracy and area overhead versus TPS size.
- Figure 4.11: Access time comparison of OD3P and DRM when number of failed bits varies.
- Figure 4.12: The PCM cell that supports N-bit to 2N-bit MLC operations. Dummy block refers to a group of fixed resistive elements used for comparison in a read circuit.
- Chapter 5
- Handling Hard Errors in PCM by Using Intra-line Level Schemes
- Figure 5.1: Non-uniformity in bit flips distribution of write operations for "Facesim".
- Figure 5.2: Shift mechanism: upon each write, the block content is shifted by one unit.
- Figure 5.3: (a) Probability of fault coverage (Eq. 5.1). (b) Average number of shifts to cover the fault(s) (Eq. 5.2).
- Figure 5.4: An example of BLESS operation on a 3-unit block.
- Figure 5.5: Lifetime of the BLESS and other techniques.
- Figure 5.6: Fault coverage distribution of different number of shifts in BLESS.
- Figure 5.7: Average number of recovered errors per page for different schemes.
- Figure 5.8: Comparing error recovery capability of BLESS and Aegis.
- Figure 5.9: Impact of unit size on IntraV and per line storage overhead.
- Figure 5.10: Memory capacity degradation as a function of lifetime under DRM, ECP, FREE-p, and SAFER schemes. Percentage of healthy cells in faulty lines are shown with”X” for each scheme when 50% of memory lines are corrupted.
- Figure 5.11: Pairing a faulty part ‘i’ with a healthy part ‘j’; after pairing part ‘i’ is marked as faulty and both ‘i’ and ‘j’ parts are stored in 2-bit MLC mode.
- Figure 5.12: Performance analysis of ILP mechanism compared to ECP-6, Free-P, and SAFER. The values are normalized to ECP-6 baseline system.
- Figure 5.13: Lifetime analysis of ILP mechanism compared to ECP-6, Free-P, and SAFER. The values are normalized to ECP-6 baseline system.
- Figure 5.14: Average number of recovered errors per line of ILP mechanism compared to ECP-6, Free-P, and SAFER.
- Figure 5.15: Partitioning effect on lifetime and storage overhead.
- Chapter 6
- Figure 6.1: Percentage of sixteen different values read/written from/into main memory for PARSEC-2 benchmark at 4-bit granularity (from most to least).
- Figure 6.2: VR-PCM binary resistance partitioning for 2-bit MLC (Up) and Extended VR-PCM partitioning for 3-bit MLC (Down).
- Figure 6.3: VR-PCM array structure and its controller.
- Figure 6.4: Degree of similarity between sets of frequent values extracted from the proposed hardware-based and the ideal software-based mechanisms.
- Figure 6.5: Read and write access latency of 2-bit VR-PCM normalized to the conventional MLC design in the evaluated 4-core system. We have 23% and 42% improvement in read and write access latency, respectively.
- Figure 6.6: Performance improvement of a system with 2-bit VR-PCM main memory normalized to conventional MLC design. This chart shows 13% improvement in system’s CPI on average.
- Figure 6.7: Normalized read, write, and total energy consumption of 2-bit VR-PCM main memory. The results reveal that we can expect considerable energy reduction in workloads with high value locality.
- Figure 6.8: Orders of lifetime improvement when using VR-PCM in a 2-bit MLC main memory. Here, we experienced an average of 1.71x improvement in lifetime (5.1x for raytrace). One can expect more lifetime improvement when the program has large frequent...
- Figure 6.9: Normalized memory access latency of 3-bit and 4-bit Extended/Reconfigurble VR-PCM. This experiment confirms considerable reduction in memory access latency especially in Reconfigurable design.
- Figure 6.10: Performance improvement of a system with 3-bit and 4-bit Extended/Reconfigurable VR-PCM main memory normalized to the conventional MLC main memory design. This chart shows 10% improvement in system’s CPI on average.
- Figure 6.11: Total memory access energy of 3-bit and 4-bit Extended/Reconfigurable VR-PCM main memory normalized to the conventional PCM.
- Figure 6.12: Orders of lifetime improvement when using Extended/Reconfigurable VR-PCM in a 3 bit and 4-bit MLC main memory. Here, we experienced an improvement of 1.04x to 3.5x in lifetime.
- Figure 6.13: Rate of soft error (in terms of bit error rate) for VR-PCM compared to conventional MLC PCM design. These charts along with Table 6.1 confirm that VR-PCM gives same level of soft-error reliability with average of 18% reduction in MLC PCM ...
- Chapter 7
- Conclusion and Future Direction
- References