Loading...

Reliability Improvement of On-chip Memories

Farbeh, Hamed | 2017

673 Viewed
  1. Type of Document: Ph.D. Dissertation
  2. Language: Farsi
  3. Document No: 49481 (19)
  4. University: Sharif University of Technology
  5. Department: Computer Engineering
  6. Advisor(s): Miremadi, Ghasem
  7. Abstract:
  8. Reliability, performance, and energy consumption are among the most important constraints that should be satisfied in modern processors design. More than 60% of the chip area is occupied by on-chip SRAM memories and they not only contribute in a large fraction of energy consumption, but also are the most error-prone components. Radiation-induced soft errors in on-chip memories are a major concern in modern processors design. Although Single Event Upsets (SEUs) have been known to be the main concern regarding SRAM memory reliability over the past decades, with the continued downscaling of technology, the occurrence rate of Multiple-Bit Upsets (MBUs) is comparable to that of SEUs in today’s nanoscale memory cells. Due to an ever-increasing rate of MBUs, providing higher protection capability with affordable overhead is a necessity in the processors design. Conventional Error Detecting/Correcting Codes (EDCs/ECCs) are only capable of protecting memories against SEUs. When configured to tolerate MBUs, the energy consumption, area, and performance overhead of EDCs/ECCs significantly increases in L1 memories. Several studies have addressed the overhead and/or correction capability of protection schemes by presenting enhanced versions of conventional coding schemes or by modifying the cache configuration. These schemes either are not applicable to L1 memories or impose significant overhead to correct MBUs. On the other hand, recent development in Spin-Transfer Torque Magnetic RAM (STT-MRAM) nominated it as a promising alternative for SRAMs. STT-RAM not only benefits from negligible leakage power and higher cell density, but also is immune to radiation-induced particles strike. However, its long write latency, high write energy, and high error rate challenge the applicability of STT-MRAM in on-chip memory. Therefore, SRAMs will be the dominant on-chip memory technology until overcoming the STT-MRAM challenges. The main contributions of this thesis are two-fold. First, we enhance L1 memories configuration to provide the capability of MBU correction with the overhead equivalent to that of the conventional SEU correction schemes. Second, we improve the ECC efficiency in STT-MRAM memories by redesigning the ECC configuration according to STT-MRAM characteristics. To correct MBUs, PSP-cache, RAW-Tag, and BCADS schemes are proposed. PSP-cache exploits the inherently available redundant cache block accesses to share the ECC bits among all blocks in a set. By sharing the ECC bits, PSP-cache improves the error detection/correction capability in tag/data array of the cache by fourfold. RAW-Tag provides MBU correction in the tag array by proposing a prefetching and an early write-back (EWB) policy and exploiting the inherent similarity of the tags and the associativity feature of the cache. By detecting MBUs using PSP-cache, RAW-Tag is capable of correcting all detectable errors. Providing an MBU-protected cache using PSP-cache and RAW-Tag, BCADS proposes to take the advantages of the cache blocks to keep a replica for dirty scratchpad memory (SPM) lines. The evaluation results show that by imposing the equivalent overhead as the conventional SEU correction scheme, the proposed schemes are capable of correcting up to 4-bit burst errors. To improve the ECC efficiency in STT-MRAM memories, Floating-ECC, A2PT, and PAReaD schemes are proposed. Floating-ECC first formulates the impact of ECCs on the lifetime of the cache blocks and shows that ECCs optimistically halve the lifetime of the cache due to ECCs high write activity. Then, it increases the lifetime of the caches to about 300% by evenly distributing the ECC write activity over all bits of the cache blocks. A2PT benefits from data-dependent and asymmetric error rates in STT-MRAM write operations to provide the required level of cache protection with significantly lower overheads. The evaluation results show that the ECC overhead in A2PT is less than half of the conventional ECC configuration. Finally, PAReaD avoids the accumulation of read disturbance errors in cache blocks by exchanging the order of ECC checking unit and the way selection unit. Avoidance of read disturbance accumulation reduces the occurrence probability of uncorrectable errors by more than three orders of magnitude
  9. Keywords:
  10. Soft Error ; Error Correction Codes ; Spin Transfer Torque-Magnetic (STT-MRAM) ; Static Random Access Memory (SRAM)Cell ; On-Chip Memories ; Memory Endurance

 Digital Object List

 Bookmark

No TOC