Sharif Digital Repository / Sharif University of Technology / Search result

Fast data delivery for many-core processors

, Article IEEE Transactions on Computers ; Volume 67, Issue 10 , 2018 , Pages 1416-1429 ; 00189340 (ISSN) Bakhshalipour, M ; Lotfi Kamran, P ; Mazloumi, A ; Samandi, F ; Naderan Tahan, M ; Modarressi, M ; Sarbazi Azad, H ; Sharif University of Technology

Abstract

Server workloads operate on large volumes of data. As a result, processors executing these workloads encounter frequent L1-D misses. In a many-core processor, an L1-D miss causes a request packet to be sent to an LLC slice and a response packet to be sent back to the L1-D, which results in high overhead. While prior work targeted response packets, this work focuses on accelerating the request packets. Unlike aggressive OoO cores, simpler cores used in many-core processors cannot hide the latency of L1-D request packets. We observe that LLC slices that serve L1-D misses are strongly temporally correlated. Taking advantage of this observation, we design a simple and accurate predictor. Upon...

DsReliM: Power-constrained reliability management in Dark-Silicon many-core chips under process variations

, Article International Conference on Hardware/Software Codesign and System Synthesis, CODES+ISSS 2015, 4 October 2015 through 9 October 2015 ; Oct , 2015 , Pages 75-82 ; 9781467383219 (ISBN) Salehi, M ; Shafique, M ; Kriebel, F ; Rehman, S ; Tavana, M. K ; Ejlali, A ; Henkel, J ; ACM; IEEE ; Sharif University of Technology

Institute of Electrical and Electronics Engineers Inc 2015

Abstract

Due to the tight power envelope, in the future technology nodes it is envisaged that not all cores in a many-core chip can be simultaneously powered-on (at full performance level). The power-gated cores are referred to as Dark Silicon. At the same time, growing reliability issues due to process variations and soft errors challenge the cost-effective deployment of future technology nodes. This paper presents a reliability management system for Dark Silicon chips (dsReliM) that optimizes for reliability of on-chip systems while jointly accounting for soft errors, process variations and the thermal design power (TDP) constraint. Towards the TDP-constrained reliability optimization, dsReliM...

An efficient hybrid-switched network-on-chip for chip multiprocessors

, Article IEEE Transactions on Computers ; Volume 65, Issue 5 , 2016 , Pages 1656-1662 ; 00189340 (ISSN) Lotfi Kamran, P ; Modarressi, M ; Sarbazi Azad, H ; Sharif University of Technology

IEEE Computer Society

Abstract

Chip multiprocessors (CMPs) require a low-latency interconnect fabric network-on-chip (NoC) to minimize processor stall time on instruction and data accesses that are serviced by the last-level cache (LLC). While packet-switched mesh interconnects sacrifice performance of many-core processors due to NoC-induced delays, existing circuit-switched interconnects do not offer lower network delays as they cannot hide the time it takes to set up a circuit. To address this problem, this work introduces CIMA - a hybrid circuit-switched and packet-switched mesh-based interconnection network that affords low LLC access delays at a small area cost. CIMA uses virtual cut-through (VCT) switching for short...

Near-Ideal networks-on-chip for servers

, Article 23rd IEEE Symposium on High Performance Computer Architecture, HPCA 2017, 4 February 2017 through 8 February 2017 ; 2017 , Pages 277-288 ; 15300897 (ISSN); 9781509049851 (ISBN) Lotfi Kamran, P ; Modarressi, M ; Sarbazi Azad, H ; Sharif University of Technology

IEEE Computer Society 2017

Abstract

Server workloads benefit from execution on many-core processors due to their massive request-level parallelism. A key characteristic of server workloads is the large instruction footprints. While a shared last-level cache (LLC) captures the footprints, it necessitates a low-latency network-on-chip (NOC) to minimize the core stall time on accesses serviced by the LLC. As strict quality-of-service requirements preclude the use of lean cores in server processors, we observe that even state-of-the-art single-cycle multi-hop NOCs are far from ideal because they impose significant NOC-induced delays on the LLC access latency, and diminish performance. Most of the NOC delay is due to per-hop...

AFRA: A low cost high performance reliable routing for 3D mesh NoCs

, Article Proceedings -Design, Automation and Test in Europe, DATE ; 2012 , Pages 332-337 ; 15301591 (ISSN) ; 9783981080186 (ISBN) Akbari, S ; Shafiee, A ; Fathy, M ; Berangi, R ; Sharif University of Technology

2012

Abstract

Three-dimensional network-on-chips are suitable communication fabrics for high-density 3D many-core ICs. Such networks have shorter communication hop count, compared to 2D NoCs, and enjoy fast and power efficient TSV wires in vertical links. Unfortunately, the fabrication process of TSV connections has not matured yet, which results in poor vertical links yield. In this work, we address this challenge and introduce AFRA, a deadlock-free routing algorithm for 3D mesh-based NoCs that tolerates faults on vertical links. AFRA is designed to be simple, high performance, and robust. The simplicity is achieved by applying ZXY and XZXY routings in the absence and presence of fault, respectively....