Loading...
Dependability analysis of data storage systems in presence of soft errors
Kishani, M ; Sharif University of Technology | 2019
362
Viewed
- Type of Document: Article
- DOI: 10.1109/TR.2018.2888515
- Publisher: Institute of Electrical and Electronics Engineers Inc , 2019
- Abstract:
- In recent years, high availability and reliability of data storage systems (DSS) have been significantly threatened by soft errors occurring in storage controllers. Due to their specific functionality and hardware-software stack, error propagation and manifestation in DSS is quite different from general-purpose computing architectures. To the best of our knowledge, no previous study has examined the system-level effects of soft errors on the availability and reliability of DSS. In this paper, we first analyze the effects of soft errors occurring in the server processors of storage controllers on the entire storage system dependability. To this end, we implement the major functions of a typical data storage system controller, running on a full stack of storage system operating system, and develop a framework to perform fault injection experiments using a full system simulator. We then propose a new metric, storage system vulnerability factor (SSVF), to accurately capture the impact of soft errors in storage systems. By conducting extensive experiment, it is revealed that depending on the controller configuration, up to 40% of cache memory contains end-user data in which any unrecoverable soft errors will result in data loss (DL) in an irreversible manner. However, soft errors in the rest of cache memory filled by operating system and storage applications will result in data unavailability (DU) at the storage system level. Our analysis also shows that detectable unrecoverable errors on the cache data field are the major cause of DU in storage systems, while silent data corruptions in the cache tag and data fields are mainly the cause of DL in storage systems
- Keywords:
- Architectural vulnerability factor (AVF) ; Cache memory ; Data loss (DL) ; Data storage system (DSS) ; Data unavailability (DU) ; Dependability ; Fault injection ; Soft error ; Storage system vulnerability factor (SSVF) ; Buffer storage ; Controllers ; Error correction ; Radiation hardening ; Software testing ; Architectural vulnerability factor ; Data loss ; Data storage systems ; Data unavailability (DU) ; Storage systems ; Computer architecture
- Source: IEEE Transactions on Reliability ; Volume 68, Issue 1 , 2019 , Pages 201-215 ; 00189529 (ISSN)
- URL: https://ieeexplore.ieee.org/document/8613038