Loading...

Energy-Efficient permanent fault tolerance in hard real-time systems

Mireshghallah, F ; Sharif University of Technology | 2019

531 Viewed
  1. Type of Document: Article
  2. DOI: 10.1109/TC.2019.2912164
  3. Publisher: IEEE Computer Society , 2019
  4. Abstract:
  5. Triple Modular Redundancy (TMR) is a historical and long-time-used approach for masking various kinds of faults. By employing redundancy and analyzing the results of three separate executions of the same program, TMR is able to attain excellent levels of reliability. While TMR provides a desirable level of reliability, it suffers from the high power consumption of the redundant hardware, a severe detriment to its broad adoption. The energy consumption of TMR can be mitigated if its operations are divided into two stages, and one stage is dropped in the absence of fault. Such an approach, which is evaluated in recent research, however, quickly fails in the presence of permanent faults, as we show in this paper. In this work, we introduce Reactive TMR, a novel energy-efficient approach for tolerating both transient and permanent faults. The key idea is to detect and deactivate faulty components and re-assign their tasks to functioning ones. Using a combination of static scheduling and dynamic task-management, our method decouples tasks from cores that are susceptible to result in a faulty execution; hence, it instinctively tolerates permanent faults and improves both reliability and energy-efficiency. Through a detailed evaluation, we show that our proposal reduces the energy consumption of baseline TMR by 30% while preserving its reliability. As compared to the state-of-the-art proposal for TMR, our method, while maintaining the energy consumption, augments hard-fault-tolerance to the system. IEEE
  6. Keywords:
  7. Low-Power Design ; Multicore Platforms ; Permanent Faults ; Real-Time Systems ; Triple Modular Redundancy ; Electric power supplies to apparatus ; Energy efficiency ; Energy utilization ; Fault tolerance ; Fault tolerant computer systems ; Interactive computer systems ; Redundancy ; Scheduling ; Hard real-time systems ; High power consumption ; Multi-core platforms ; Static scheduling ; Transient and permanent fault ; Real time systems
  8. Source: IEEE Transactions on Computers ; 2019 ; 00189340 (ISSN)
  9. URL: https://ieeexplore.ieee.org/document/8697092