Loading...

Characterization of spatial fault patterns in interconnection networks

Hoseiny Farahabady, M ; Sharif University of Technology | 2006

220 Viewed
  1. Type of Document: Article
  2. DOI: 10.1016/j.parco.2006.09.004
  3. Publisher: 2006
  4. Abstract:
  5. Parallel computers, such as multiprocessors system-on-chip (Mp-SoCs), multicomputers and cluster computers, are consisting of hundreds or thousands multiple processing units and components (such as routers, channels and connectors) connected via some interconnection network that collectively may undergo high failure rates. Therefore, these systems are required to be equipped with fault-tolerant mechanisms to ensure that the system will keep running in a degraded mode. Normally, the faulty components are coalesced into fault regions, which are classified into two major categories: convex and concave regions. In this paper, we propose the first solution to calculate the probability of occurrences of common fault patterns in torus and mesh interconnection networks which includes both convex ({divides}-shaped, □-shaped) and concave (L-shaped, T-shaped, +-shaped, H-shaped) regions. These results play a key role when studying, particularly, the performance analysis of routing algorithms proposed for interconnection networks under faulty conditions. © 2006 Elsevier B.V. All rights reserved
  6. Keywords:
  7. Algorithms ; Fault tolerant computer systems ; Microprocessor chips ; Multiprocessing systems ; Parallel processing systems ; Routers ; Fault patterns ; Performance analysis ; Routing algorithms ; Torus ; Interconnection networks
  8. Source: Parallel Computing ; Volume 32, Issue 11-12 , 2006 , Pages 886-901 ; 01678191 (ISSN)
  9. URL: https://www.sciencedirect.com/science/article/abs/pii/S0167819106000676