Sharif Digital Repository / Sharif University of Technology / Search result

Energy-Efficient permanent fault tolerance in hard real-time systems

, Article IEEE Transactions on Computers ; 2019 ; 00189340 (ISSN) Mireshghallah, F ; Bakhshalipour, M ; Sadrosadati, M ; Sarbazi Azad, H ; Sharif University of Technology

IEEE Computer Society 2019

Abstract

Triple Modular Redundancy (TMR) is a historical and long-time-used approach for masking various kinds of faults. By employing redundancy and analyzing the results of three separate executions of the same program, TMR is able to attain excellent levels of reliability. While TMR provides a desirable level of reliability, it suffers from the high power consumption of the redundant hardware, a severe detriment to its broad adoption. The energy consumption of TMR can be mitigated if its operations are divided into two stages, and one stage is dropped in the absence of fault. Such an approach, which is evaluated in recent research, however, quickly fails in the presence of permanent faults, as we...

Energy-Efficient permanent fault tolerance in hard real-time systems

, Article IEEE Transactions on Computers ; 2019 ; 00189340 (ISSN) Mireshghallah, F ; Bakhshalipour, M ; Sadrosadati, M ; Sarbazi Azad, H ; Sharif University of Technology

IEEE Computer Society 2019

Abstract

Triple Modular Redundancy (TMR) is a historical and long-time-used approach for masking various kinds of faults. By employing redundancy and analyzing the results of three separate executions of the same program, TMR is able to attain excellent levels of reliability. While TMR provides a desirable level of reliability, it suffers from the high power consumption of the redundant hardware, a severe detriment to its broad adoption. The energy consumption of TMR can be mitigated if its operations are divided into two stages, and one stage is dropped in the absence of fault. Such an approach, which is evaluated in recent research, however, quickly fails in the presence of permanent faults, as we...

Traffic-load-aware virtual channel power-gating in network-on-chips

, Article Advances in Computers ; 2021 ; 00652458 (ISSN) Sadrosadati, M ; Mirhosseini, A ; Akbarzadeh, N ; Modarressi, M ; Sarbazi Azad, H ; Sharif University of Technology

Academic Press Inc 2021

Abstract

Network-on-Chips (NoCs) employ several virtual channels per input port to mitigate head-of-line blocking issue in transmitting network packets. Unfortunately, these virtual channels are power-hungry resources that significantly contribute to the total power consumption of NoCs. In particular, we make the key observation that even in high load traffic, a number of virtual channels are idle, imposing significant static power overhead. Prior works use power-gating technique to switch off idle VCs and reduce the static power consumption. However, we observe that prior works are mostly suitable for low traffic loads and are ineffective in high traffic loads. In this chapter, we aim to propose a...

Traffic-load-aware virtual channel power-gating in network-on-chips

, Article Advances in Computers ; Volume 124 , 2022 , Pages 1-19 ; 00652458 (ISSN); 9780323856881 (ISBN) Sadrosadati, M ; Mirhosseini, A ; Akbarzadeh, N ; Modarressi, M ; Sarbazi Azad, H ; Sharif University of Technology

Academic Press Inc 2022

Abstract

Network-on-Chips (NoCs) employ several virtual channels per input port to mitigate head-of-line blocking issue in transmitting network packets. Unfortunately, these virtual channels are power-hungry resources that significantly contribute to the total power consumption of NoCs. In particular, we make the key observation that even in high load traffic, a number of virtual channels are idle, imposing significant static power overhead. Prior works use power-gating technique to switch off idle VCs and reduce the static power consumption. However, we observe that prior works are mostly suitable for low traffic loads and are ineffective in high traffic loads. In this chapter, we aim to propose a...

Reducing Power of On-chip Networks by Exploiting Latency Asymmetry of Router’s Pipeline Stages

, M.Sc. Thesis Sharif University of Technology Sadrosadati, Mohammad (Author) ; Sarbazi Azad, Hamid (Supervisor)

Abstract

NOCs contribute to a large portion of a many-core SOC power consumption. A significant fraction of the mentioned power consumption is due to the buffers, crossbar and the links. Thus, in this thesis, a new method would be introduced which reduces the power consumption of the NOCs in large scale. This method utilizes the latency asymmetry of router pipeline stages for dynamic power reduction and uses different voltage swings for buffers, links and the crossbar in order to decrease the dynamic power consumption while maintaining the performance. Moreover, since the static power consumption has gained a noticeable importance in recent years, a method for degrading this power component is also...

Proposing a Scalable and Energy-aware Architecture for Register File of GPUs

, Ph.D. Dissertation Sharif University of Technology Sadrosadati, Mohammad (Author) ; Sarbazi-Azad, Hamid (Supervisor)

Abstract

Graphics Processing Units (GPUs) employ large register files to accommodate all active threads and accelerate context switching. Unfortunately, register files are a scalability bottleneck for future GPUs due to long access latency, high power consumption, and large silicon area provisioning. In this thesis, we propose the Latency-Tolerant Register File (LTRF) architecture to achieve low latency in a two-level hierarchical structure. We observe that compile-time interval analysis enables us to divide GPU program execution into intervals with an accurate estimate of a warp’s aggregate register working-set within each interval. The key idea of LTRF is to prefetch the estimated register...

Data-Aware compression of neural networks

, Article IEEE Computer Architecture Letters ; Volume 20, Issue 2 , 2021 , Pages 94-97 ; 15566056 (ISSN) Falahati, H ; Peyro, M ; Amini, H ; Taghian, M ; Sadrosadati, M ; Lotfi Kamran, P ; Sarbazi Azad, H ; Sharif University of Technology

Institute of Electrical and Electronics Engineers Inc 2021

Abstract

Deep Neural networks (DNNs) are getting deeper and larger which intensify the data movement and compute demands. Prior work focuses on reducing data movements and computation through exploiting sparsity and similarity. However, none of them exploit input similarity and only focus on sparsity and weight similarity. Synergistically analysing the similarity and sparsity of inputs and weights, we show that memory accesses and computations can be reduced by 5.7× and 4.1×, more than what can be decreased by exploiting only sparsity, and 3.9× and 2.1×, more than what can be decreased by exploiting only weight similarity. We propose a new data-aware compression approach, called DANA, to effectively...

NURA: A framework for supporting non-uniform resource accesses in GPUs

, Article Performance Evaluation Review ; Volume 50, Issue 1 , 2022 , Pages 39-40 ; 01635999 (ISSN) Darabi, S ; Mahani, N ; Baxishi, H ; Yousefzadeh, E ; Sadrosadati, M ; Sarbazi Azad, H ; Sharif University of Technology

Association for Computing Machinery 2022

Abstract

Multi-application execution in Graphics Processing Units (GPUs), a promising way to utilize GPU resources, is still challenging. Some pieces of prior work (e.g. spatial multitasking) have limited opportunity to improve resource utilization, while others, e.g. simultaneous multi-kernel, provide fine-grained resource sharing at the price of unfair execution. This paper proposes a new multi-application paradigm for GPUs, called NURA, that provides high potential to improve resource utilization and ensure fairness and Quality-of-Service(QoS). The key idea is that each streaming multiprocessor (SM) executes the Cooperative Thread Arrays (CTAs) that belong to only one application (similar to...

NURA: A framework for supporting non-uniform resource accesses in GPUs

, Article 2022 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS/PERFORMANCE 2022, 6 June 2022 through 10 June 2022 ; 2022 , Pages 39-40 ; 9781450391412 (ISBN) Darabi, S ; Mahani, N ; Baxishi, H ; Yousefzadeh, E ; Sadrosadati, M ; Sarbazi Azad, H ; ACM SIGMETRICS ; Sharif University of Technology

Association for Computing Machinery, Inc 2022

Abstract

Multi-application execution in Graphics Processing Units (GPUs), a promising way to utilize GPU resources, is still challenging. Some pieces of prior work (e.g. spatial multitasking) have limited opportunity to improve resource utilization, while others, e.g. simultaneous multi-kernel, provide fine-grained resource sharing at the price of unfair execution. This paper proposes a new multi-application paradigm for GPUs, called NURA, that provides high potential to improve resource utilization and ensure fairness and Quality-of-Service(QoS). The key idea is that each streaming multiprocessor (SM) executes the Cooperative Thread Arrays (CTAs) that belong to only one application (similar to...

NURA: A framework for supporting non-uniform resource accesses in gpus

, Article Proceedings of the ACM on Measurement and Analysis of Computing Systems ; Volume 6, Issue 1 , 2022 ; 24761249 (ISSN) Darabi, S ; Mahani, N ; Baxishi, H ; Yousefzadeh Asl Miandoab, E ; Sadrosadati, M ; Sarbazi Azad, H ; Sharif University of Technology

Association for Computing Machinery 2022

Abstract

Multi-application execution in Graphics Processing Units (GPUs), a promising way to utilize GPU resources, is still challenging. Some pieces of prior work (e.g., spatial multitasking) have limited opportunity to improve resource utilization, while other works, e.g., simultaneous multi-kernel, provide fine-grained resource sharing at the price of unfair execution. This paper proposes a new multi-application paradigm for GPUs, called NURA, that provides high potential to improve resource utilization and ensures fairness and Quality-of-Service (QoS). The key idea is that each streaming multiprocessor (SM) executes Cooperative Thread Arrays (CTAs) belong to only one application (similar to the...

An efficient DVS scheme for on-chip networks using reconfigurable Virtual Channel allocators

, Article Proceedings of the International Symposium on Low Power Electronics and Design, 22 July 2015 through 24 July 2015 ; Volume 2015-September , July , 2015 , Pages 249-254 ; 15334678 (ISSN) ; 9781467380096 (ISBN) Sadrosadati, M ; Mirhosseini, A ; Aghilinasab, H ; Sarbazi Azad, H ; Sharif University of Technology

Institute of Electrical and Electronics Engineers Inc 2015

Abstract

Network-on-Chip (NoC) is a key element in the total power consumption of a chip multiprocessor. Dynamic Voltage Scaling is a promising method for power saving in NoCs since it contributes to reduction in both static and dynamic power consumptions. In this paper, we propose a novel scheme to reduce on-chip network power consumption when the number of Virtual Channels (VCs) with active allocation requests per cycle is less than the number of total VCs. In our method, we introduce a reconfigurable arbitration logic which can be configured to have multiple latencies and hence, multiple slack times. The increased slack times are then used to reduce the supply voltage of the routers in order to...

A Method to Improve Adaptivity of Odd-Even Routing Algorithm in Mesh NoCs

, Article 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP 2016, 17 February 2016 through 19 February 2016 ; 2016 , Pages 755-758 ; 9781467387750 (ISBN) Sadrosadati, M ; Bashizade, R ; Roozkhosh, S ; Shafiee, A ; Sarbazi Azad, H ; Sharif University of Technology

Institute of Electrical and Electronics Engineers Inc 2016

Abstract

Adaptive routing algorithms help balancing the resource utilization in different parts of the network and hence, prevent a resource becoming the performance bottleneck while other resources are still under-utilized. In this paper, we present a novel approach, called Preemptive Waiting, which is applied to Odd-Even routing algorithm (PWOE). PWOE postpones the saturation traffic rate of NoC by 13.4% compared to OE, under synthetic traffic loads. © 2016 IEEE

An efficient DVS scheme for on-chip networks

, Article Advances in Computers ; 2021 ; 00652458 (ISSN) Sadrosadati, M ; Mirhosseini, A ; Akbarzadeh, N ; Aghilinasab, H ; Sarbazi Azad, H ; Sharif University of Technology

Academic Press Inc 2021

Abstract

Network-on-Chips (NoCs) consume a significant portion of multiprocessors' total power. Dynamic Voltage Scaling (DVS) which can reduce both static and dynamic power consumption is widely applied to NoCs. However, prior DVS schemes usually impose significant performance overhead to NoCs as NoCs need to work with lower clock frequencies when the supply voltage is scaled down. In this chapter, we propose a novel DVS scheme for NoCs with no performance overhead. We reduce power consumption when there is few Virtual Channels (VCs) that have active allocation requests at each cycle compared to the total number of available VCs. To enable multiple latencies with different slack times, we propose a...

An efficient DVS scheme for on-chip networks

, Article Advances in Computers ; Volume 124 , 2022 , Pages 21-43 ; 00652458 (ISSN); 9780323856881 (ISBN) Sadrosadati, M ; Mirhosseini, A ; Akbarzadeh, N ; Aghilinasab, H ; Sarbazi Azad, H ; Sharif University of Technology

Academic Press Inc 2022

Abstract

Network-on-Chips (NoCs) consume a significant portion of multiprocessors' total power. Dynamic Voltage Scaling (DVS) which can reduce both static and dynamic power consumption is widely applied to NoCs. However, prior DVS schemes usually impose significant performance overhead to NoCs as NoCs need to work with lower clock frequencies when the supply voltage is scaled down. In this chapter, we propose a novel DVS scheme for NoCs with no performance overhead. We reduce power consumption when there is few Virtual Channels (VCs) that have active allocation requests at each cycle compared to the total number of available VCs. To enable multiple latencies with different slack times, we propose a...

LTRF: enabling high-capacity register files for GPUs via hardware/software cooperative register prefetching

, Article 23rd International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2018, 24 March 2018 through 28 March 2018 ; 2018 , Pages 489-502 ; 9781450349116 (ISBN) Sadrosadati, M ; Mirhosseini, A ; Ehsani, S. B ; Sarbazi Azad, H ; Drumond, M ; Falsafi, B ; Ausavarungnirun, R ; Mutlu, O ; Sharif University of Technology

Association for Computing Machinery 2018

Abstract

Graphics Processing Units (GPUs) employ large register files to accommodate all active threads and accelerate context switching. Unfortunately, register files are a scalability bottleneck for future GPUs due to long access latency, high power consumption, and large silicon area provisioning. Prior work proposes hierarchical register file, to reduce the register file power consumption by caching registers in a smaller register file cache. Unfortunately, this approach does not improve register access latency due to the low hit rate in the register file cache. In this paper, we propose the Latency-Tolerant Register File (LTRF) architecture to achieve low latency in a two-level hierarchical...

ITAP: Idle-time-aware power management for GPU execution units

, Article ACM Transactions on Architecture and Code Optimization ; Volume 16, Issue 1 , 2019 ; 15443566 (ISSN) Sadrosadati, M ; Ehsani, S. B ; Falahati, H ; Ausavarungnirun, R ; Tavakkol, A ; Abaee, M ; Orosa, L ; Wang, Y ; Sarbazi Azad, H ; Mutlu, O ; Sharif University of Technology

Association for Computing Machinery 2019

Abstract

Graphics Processing Units (GPUS) are widely used as the accelerator of choice for applications with massively data-parallel tasks. However, recent studies show that GPUS suffer heavily from resource underutilization, which, combined with their large static power consumption, imposes a significant power overhead. One of the most power-hungry components of a GPU-the execution units-frequently experience idleness when (1) an underutilized warp is issued to the execution units, leading to partial lane idleness, and (2) there is no active warp to be issued for the execution due to warp stalls (e.g., waiting for memory access and synchronization). Although large in total, the idle time of...

Highly concurrent latency-tolerant register files for GPUs

, Article ACM Transactions on Computer Systems ; Volume 37, Issue 1-4 , 2021 ; 07342071 (ISSN) Sadrosadati, M ; Mirhosseini, A ; Hajiabadi, A ; Ehsani, S. B ; Falahati, H ; Sarbazi Azad, H ; Drumond, M ; Falsafi, B ; Ausavarungnirun, R ; Mutlu, O ; Sharif University of Technology

Association for Computing Machinery 2021

Abstract

Graphics Processing Units (GPUs) employ large register files to accommodate all active threads and accelerate context switching. Unfortunately, register files are a scalability bottleneck for future GPUs due to long access latency, high power consumption, and large silicon area provisioning. Prior work proposes hierarchical register file to reduce the register file power consumption by caching registers in a smaller register file cache. Unfortunately, this approach does not improve register access latency due to the low hit rate in the register file cache. In this article, we propose the Latency-Tolerant Register File (LTRF) architecture to achieve low latency in a two-level hierarchical...

Effective cache bank placement for GPUs

, Article 20th Design, Automation and Test in Europe, DATE 2017, 27 March 2017 through 31 March 2017 ; 2017 , Pages 31-36 ; 9783981537093 (ISBN) Sadrosadati, M ; Mirhosseini, A ; Roozkhosh, S ; Bakhishi, H ; Sarbazi Azad, H ; ACM Special Interest Group on Design Automation (ACM SIGDA); Electronic System Design Alliance (ESDA); et al.; European Design and Automation Association (EDAA); European Electronic Chips and Systems Design Initiative (ECSI); IEEE Council on Electronic Design Automation (CEDA) ; Sharif University of Technology

Institute of Electrical and Electronics Engineers Inc 2017

Abstract

The placement of the Last Level Cache (LLC) banks in the GPU on-chip network can significantly affect the performance of memory-intensive workloads. In this paper, we attempt to offer a placement methodology for the LLC banks to maximize the performance of the on-chip network connecting the LLC banks to the streaming multiprocessors in GPUs. We argue that an efficient placement needs to be derived based on a novel metric that considers the latency hiding capability of the GPUs through thread level parallelism. To this end, we propose a throughput aware metric, called Effective Latency Impact (ELI). Moreover, we define an optimization problem to formulate our placement approach based on the...

Probability of missed detection as a criterion for receiver placement in MIMO PCL

, Article IEEE National Radar Conference - Proceedings, 7 May 2012 through 11 May 2012, Atlanta, GA ; 2012 , Pages 0924-0927 ; 10975659 (ISSN) ; 9781467306584 (ISBN) Majd, M. N ; Chitgarha, M. M ; Radmard, M ; Nayebi, M. M ; Sharif University of Technology

IEEE 2012

Abstract

Using multiple antennas at the transmit and receive sides of a passive radar brings both the benefits of MIMO radar and passive radar. However one of the obstacles arisen in such configuration is the receive antennas placement in proper positions so that the radar performance is improved. Here we just consider the case of positioning one receiver among multiple illuminators of opportunity. Indeed it is a start for the solution of optimizing the geometry of the multiple receivers in a passive radar

An efficient method for the ring opening of epoxides with aromatic amines by Sb(III) chloride under microwave irradiation

, Article Journal of Chemical Research ; Issue 4 , 2008 , Pages 220-221 ; 03082342 (ISSN) Ghazanfari, D ; Hashemi, M. M ; Mottaghi, M. M ; Foroughi, M. M ; Sharif University of Technology

2008

Abstract

SbCl3 supported on montmorillonite K-10 is an efficient catalyst for the ring opening of epoxides with aromatic amines under solvent-free conditions and microwave irradiation to give the corresponding b-amino alcohols in high yields with high regioselectivity