Loading...
Dynamic Task Replication with Imperfect Fault Detection in Multicore Cyber-Physical Systems
Hosseini Kasnavieh, Hossein | 2024
3
Viewed
- Type of Document: M.Sc. Thesis
- Language: Farsi
- Document No: 56886 (19)
- University: Sharif University of Technology
- Department: Computer Engineering
- Advisor(s): Ansari, Mohsen
- Abstract:
- Abstract: Fault tolerance in computing systems often depends on the precision of fault detection methods, significantly impacting overall reliability. Classic fault tolerance methods, like task replication, struggle to achieve certain reliability targets with imperfect fault detection, which, unlike perfect detection mechanisms, imposes minimal system overheads. Addressing this, our paper introduces Dynamic Task Replication (DTR), a general fault tolerance technique that dynamically determines the number of replicas at runtime to overcome the limitations of classical task replication. Our primary contribution, Optimal Dynamic Task Replication (ODTR), optimizes DTR for a given task, aiming to minimize the expected number of replicas while achieving the reliability target. We further explore incorporating actual execution times into ODTR's reliability assessment. Additionally, we propose the Energy-Aware Reliability-Guaranteeing (EARG) scheduling technique, integrating ODTR into hard real-time systems. EARG leverages Dynamic Voltage and Frequency Scaling (DVFS) to minimize energy consumption while ensuring reliability targets and system schedulability. Experimental results show that ODTR requires 24% fewer average replicas than the well-established N-Modular Redundancy (NMR) technique in general, and this advantage increases to 58% for tasks with low base reliabilities. Moreover, evaluations across diverse system workloads reveal that EARG significantly conserves energy and enhances feasibility compared to existing scheduling techniques
- Keywords:
- Reliability ; Tasks Replication ; Cyber-Physical Systems ; Fault Tolerance
