Sharif Digital Repository / Sharif University of Technology / Search result

P2R2: Parallel Pseudo-Round-Robin arbiter for high performance NoCs

, Article Integration, the VLSI Journal ; November , 2014 ; ISSN: 1679260 Bashizade, R ; Sarbazi-Azad, H ; Sharif University of Technology

Abstract

Networks-on-Chip (NoCs) play an important role in the performance of Chip Multi-Processors (CMPs). Providing the desired performance under heavy traffics imposed by some applications necessitates NoC routers to have a large number of Virtual Channels (VCs). Increasing the number of VCs, however, will add to the delay of the critical path of the arbitration logic, and hence restricts the clock frequency of the router. In order to make it possible to enjoy the benefits of having many VCs and keep the clock frequency as high as that of a low-VC router, we propose Parallel Pseudo-Round-Robin (P2R2) arbiter. Our proposal is based on processing multiple groups of requests in parallel. Our...

Temperature control in three-network on chips using task migration

, Article IET Computers and Digital Techniques ; Vol. 7, issue. 6 , November , 2013 , pp. 274-281 ; 1751-861X (online) Hassanpour, N ; Hessabi, H ; Hamedani, P. K ; Sharif University of Technology

Abstract

Combination of three-dimensional (3D) IC technology and network on chip (NoC) is an effective solution to increase system scalability and also alleviate the interconnect problem in large-scale integrated circuits. However, because of the increased power density in 3D NoC systems and the destructive effect of high temperatures on chip reliability, applying thermal management solutions becomes crucial in such circuits. In this study, the authors propose a runtime distributed migration algorithm based on game theory to balance the heat dissipation among processing elements (PEs) in a 3D NoC chip multiprocessor. The objective of this algorithm is to minimise the 3D NoC system's peak temperature,...

Designing best effort networks-on-chip to meet hard latency constraints

, Article Transactions on Embedded Computing Systems ; Vol. 12, issue 4 , June , 2013 ; ISSN: 15399087 Seiculescu, C ; Rahmati, D ; Murali, S ; Sarbazi-Azad, H ; Benini, L ; Micheli, G. D ; Sharif University of Technology

Abstract

Many classes of applications require Quality of Service (QoS) guarantees from the system interconnect. In Networks-on-Chip (NoC) QoS guarantees usually translate into bandwidth and latency constraints for the traffic flows and require hardware support in the NoC fabric and its interfaces. In this article we present a novel NoC synthesis framework to automatically build networks that meet hard latency constraints of end-to-end traffic streams without requiring specialized hardware for the network components. The hard latency constraints are met by carefully designing the NoC topology and selecting the appropriate routes for flow using lean best-effort network components. We perform...

Using task migration to improve non-contiguous processor allocation in NoC-based CMPs

, Article Journal of Systems Architecture ; Vol. 59, issue. 7 , 2013 , pp. 468-481 ; ISSN: 13837621 Modarressi, M ; Asadinia, M ; Sarbazi-Azad, H ; Sharif University of Technology

Abstract

In this paper, a processor allocation mechanism for NoC-based chip multiprocessors is presented. Processor allocation is a well-known problem in parallel computer systems and aims to allocate the processing nodes of a multiprocessor to different tasks of an input application at run time. The proposed mechanism targets optimizing the on-chip communication power/latency and relies on two procedures: processor allocation and task migration. Allocation is done by a fast heuristic algorithm to allocate the free processors to the tasks of an incoming application when a new application begins execution. The task-migration algorithm is activated when some application completes execution and frees up...

Network-on-SSD: A scalable and high-performance communication design paradigm for SSDs

, Article IEEE Computer Architecture Letters ; Vol. 12, issue 1, Article number 6178186 , 2013 , pp. 5-8 ; ISSN: 15566056 Tavakkol, A ; Arjomand, M ; Sarbazi-Azad, H ; Sharif University of Technology

Abstract

In recent years, flash memory solid state disks (SSDs) have shown a great potential to change storage infrastructure because of its advantages of high speed and high throughput random access. This promising storage, however, greatly suffers from performance loss because of frequent ''erase-before-write'' and ''garbage collection'' operations. Thus, novel circuit-level, architectural, and algorithmic techniques are currently explored to address these limitations. In parallel with others, current study investigates replacing shared buses in multi-channel architecture of SSDs with an interconnection network to achieve scalable, high throughput, and reliable SSD storage systems. Roughly...

Supporting non-contiguous processor allocation in mesh-based chip multiprocessors using virtual point-to-point links

, Article IET Computers and Digital Techniques ; Vol. 6, issue. 5 , September , 2012 , pp. 302-317 ; ISSN: 17518601 Asadinia, M ; Modarressi, M ; Sarbazi-Azad, H ; Sharif University of Technology

Abstract

In this study, the authors propose a processor allocation mechanism for run-time assignment of a set of communicating tasks of input applications onto the processing nodes of a chip multiprocessor, when the arrival order and execution lifetime of the input applications are not known a priori. This mechanism targets the on-chip communication and aims to reduce the power and latency of the network-on-chip employed as the communication infrastructure. In this work, the authors benefit from the advantages of non-contiguous processor allocation mechanisms, by allowing the tasks of the input application mapped onto disjoint regions (submeshes) and then virtually connecting them by bypassing the...

Application specific router architectures for NoCs: An efficiency and power consumption analysis

, Article Mechatronics ; Vol. 22, issue. 5 , August , 2012 , pp. 531-537 ; ISSN: 9574158 Najjari, N ; Sarbazi-Azad, H ; Sharif University of Technology

Abstract

Networks on chip (NoC) have been proposed as a solution to mitigate complex on-chip communication problems. NoCs are composed of intellectual properties (IP) which are interconnected by on-chip switching fabrics. A step in the design process of NoCs is hardware virtualization which is mapping the IP cores onto the tiles of a chip. The communication among the IP cores greatly affects the performance and power consumption of NoCs which itself is deeply related to the placement of IPs onto the tiles of the network. Different mapping algorithms have been proposed for NoCs which allocate a set of IPs to given network topologies. In these mapping algorithms, there is a restriction which limits IPs...

Low-power arithmetic unit for DSP applications

, Article International Symposium on System on Chip, SoC ; 31 October- 2 November , 2011 , pp. 68-71 ; ISBN: 9781457706721 Modarressi, M ; Nikounia, S. H ; Jahangir, A. H ; Sharif University of Technology

Abstract

DSP algorithms are one of the most important components of modern embedded computer systems. These applications generally include fixed point and floating-point arithmetic operations and trigonometric functions which have long latencies and high power consumption. Nonetheless, DSP applications enjoy from some interesting characteristics such as tolerating slight loss of accuracy and high degree of value locality which can be exploited to improve their power consumption and performance. In this paper, we present an application-specific result-cache that aims to reduce the power consumption and latency of DSP algorithms by reusing the results of the arithmetic operations executed on the same...

Supporting non-contiguous processor allocation in mesh-based CMPs using virtual point-to-point links

, Article Proceedings -Design, Automation and Test in Europe, DATE ; 2011 , p. 413-418 ; ISSN: 15301591 ; ISBN: 9783981080179 Asadinia, M ; Modarressi, M ; Tavakkol, A ; Sarbazi-Azad, H ; Sharif University of Technology

Abstract

In this paper, we propose a processor allocation mechanism for run-time assignment of a set of communicating tasks of input applications onto the processing nodes of a Chip Multiprocessor (CMP), when the arrival order and execution lifetime of the input applications are not known a priori. This mechanism targets the on-chip communication and aims to reduce the power and latency of the NoC employed as the communication infrastructure. In this work, we benefit from the advantages of non-contiguous processor allocation mechanisms, by allowing the tasks of the input application mapped onto disjoint regions (sub-meshes) and then virtually connecting them by bypassing the router pipeline stages of...

Multicast-aware mapping algorithm for on-chip networks

, Article Proceedings - 19th International Euromicro Conference on Parallel, Distributed, and Network-Based Processing, PDP 2011 ; 2011 , p. 455-462 ; ISBN: 9780769543284 Habibi, A ; Arjomand, M ; Sarbazi-Azad, H ; Sharif University of Technology

Abstract

Networks-on-Chip (NoCs for short) are known as the most scalable and reliable on-chip communication architectures for multi-core SoCs with tens to hundreds IP cores. Proper mapping the IP cores on NoC tiles (or assigning threads to cores in chip multiprocessors) can reduce end-to-end delay and energy consumption. While almost all previous works on mapping consider higher priority for the application's flows with higher required bandwidth, a mapping strategy, presented in this paper, is introduced that considers multicast communication flows in addition to the normal unicast flows. To this end, multicast and unicast traffic flows are first characterized in terms of some new metrics which are...

The 2D SEM: A novel high-performance and low-power mesh-based topology for networks-on-chip

, Article International Journal of Parallel, Emergent and Distributed Systems ; Vol. 25, issue. 4 , 2010 , p. 331-344 ; ISSN: 17445760 Sabbaghi-Nadooshan, R ; Modarressi, M ; Sarbazi-Azad, H ; Sharif University of Technology

Abstract

In this paper, a 2D shuffle-exchange based mesh topology, or 2D shuffle-exchange mesh (SEM) for short, is presented for network-on-chips. The proposed 2D topology applies the conventional well-known shuffle-exchange structure in each row and each column of the network. Compared to an equal sized mesh which is the most common topology in on-chip networks, the proposed shuffle-exchange based mesh network has smaller diameter but for an equal cost. Finally for better performance cross-shuffle is proposed. Simulation results show that the 2D SEM and 2D cross-shuffle effectively reduce the power consumption and improve performance metrics of the on-chip networks compared to the conventional mesh...

Virtual point-to-point connections for NoCs

, Article IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems ; Vol. 29, issue. 6 , 2010 , p. 855-868 ; ISSN: 02780070 Modarressi, M ; Tavakkol, A ; Sarbazi-Azad, H ; Sharif University of Technology

Abstract

In this paper, we aim to improve the performance and power metrics of packet-switched network-on-chips (NoCs) and benefits from the scalability and resource utilization advantages of NoCs and superior communication performance of point-to-point dedicated links. The proposed method sets up the virtual point-to-point (VIP) connections over one virtual channel (which bypasses the entire router pipeline) at each physical channel of the NoC. We present two schemes for constructing such VIP circuits. In the first scheme, the circuits are constructed for an application based on its task-graph at design time. The second scheme addresses constructing the connections at run-time using a light-weight...

P2R2: Parallel Pseudo-Round-Robin arbiter for high performance NoCs

, Article Integration, the VLSI Journal ; Volume 50 , 2014 , pp.173–182 ; ISSN: 0167-9260 Bashizade, R ; Sarbazi-Azad, H ; Sharif University of Technology

Abstract

Networks-on-Chip (NoCs) play an important role in the performance of Chip Multi-Processors (CMPs). Providing the desired performance under heavy traffics imposed by some applications necessitates NoC routers to have a large number of Virtual Channels (VCs). Increasing the number of VCs, however, will add to the delay of the critical path of the arbitration logic, and hence restricts the clock frequency of the router. In order to make it possible to enjoy the benefits of having many VCs and keep the clock frequency as high as that of a low-VC router, we propose Parallel Pseudo-Round-Robin (P2R2) arbiter. Our proposal is based on processing multiple groups of requests in parallel. Our...

A fast, flexible, and easy-to-develop FPGA-based fault injection technique

, Article Microelectronics Reliability ; Volume 54, Issue 5 , May , 2014 , Pages 1000-1008 ; ISSN: 00262714 Ebrahimi, M ; Mohammadi, A ; Ejlali, A ; Miremadi, S. G ; Sharif University of Technology

Abstract

By technology down scaling in nowadays digital circuits, their sensitivity to radiation effects increases, making the occurrence of soft errors more probable. As a consequence, soft error rate estimation of complex circuits such as processors is becoming an important issue in safety- and mission-critical applications. Fault injection is a well-known and widely used approach for soft error rate estimation. Development of previous FPGA-based fault injection techniques is very time consuming mainly because they do not adequately exploit supplementary FPGA tools. This paper proposes an easy-to-develop and flexible FPGA-based fault injection technique. This technique utilizes debugging facilities...

Comparison between optimal interconnection network in different 2D and 3D NoC structures

, Article International System on Chip Conference ; 2014 , p. 171-176 Radfar, F ; Zabihi, M ; Sarvari, R ; Sharif University of Technology

Abstract

The current article studies optimal intercore interconnect network in a NoC structure for 2D and 3D mesh, torus and hypercube topologies. Optimal wire width/spacing is calculated by numerically maximizing bandwidth times the reciprocal delay, which depends on the technology node and hop length. Through 3D integration and increasing tiers, optimal interconnect width and spacing in torus and hypercube topologies will decrease. The core-to-core channel width in all topologies will be obtained by assigning 20% of the power consumption to the routers. By increasing number of cores, channel width will decrease due to reduced power consumption of each core. This is more in hypercube topology, due...

Emerging non-volatile memory technologies for future low power reconfigurable systems

, Article 9th International Symposium on Reconfigurable and Communication-Centric Systems-on-Chip, ReCoSoC ; 26-28 May , 2014 , pp. 1-2 ; 9781479958108 Ahari, A ; Asadi, H ; Tahoori, M. B ; Sharif University of Technology

Abstract

Non-volatile memory (NVM) technologies are promising alternatives to traditional CMOS memory technologies. While NVMs were primarily studied to be used in the memory hierarchy, they can also provide benefits in reconfigurable systems such as Field-Programmable Gate Arrays (FPGAs). In this paper, we investigate the applicability of different NVM technologies for the configuration bits of FPGAs and propose a power-efficient reconfigurable architecture based on Phase Change Memory (PCM). Quantitative analysis for various FPGA architectures using different memory technologies shows the benefits of the proposed scheme

A comparative study of energy/power consumption in parallel decimal multipliers

, Article Microelectronics Journal ; Vol. 45, Issue 6 , June , 2014 , pp. 775-780 Malekpour, A ; Ejlali, A ; Gorgin, S ; Sharif University of Technology

Abstract

Decimal multiplication is a frequent operation with inherent complexity in implementation. Commercial and financial applications require working with decimal numbers while it has been shown that if we convert decimal number to binary ones, this will negatively influence the preciseness required for these applications. Existing research works on parallel decimal multipliers have mainly focused on latency and area as two major factors to be improved. However, energy/power consumption is another prominent issue in today's digital systems. While the energy consumption of parallel decimal multipliers has not been addressed in previous works, in this paper we present a comparative study of...

A data recomputation approach for reliability improvement of scratchpad memory in embedded systems

, Article Proceedings - IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems ; 2014 , pp. 228-233 Sayadi, H ; Farbeh, H ; Monazzah, A. M. H ; Miremadi, S. G ; Sharif University of Technology

Abstract

Scratchpad memory (SPM) is extensively used as the on-chip memory in modern embedded processors alongside of the cache memory or as its alternative. Soft errors in SPM are one of the major contributors to system failures, due to ever-increasing susceptibility of SPM cells to energetic particle strikes. Since a large fraction of soft errors occurs in the shape of Multiple-Bit Upsets (MBUs), traditional memory protection techniques, i.e., Error Correcting Code (ECCs), are not affordable for SPM protection; mainly because of their limited error coverage and/or their high overheads. This paper proposes a novel algorithm that efficiently protects SPM with high error correction capability and...

Soft error rate estimation for combinational logic in presence of single event multiple transients

, Article Journal of Circuits, Systems and Computers ; Vol. 23, issue. 6 , 2014 Rajaei, R ; Tabandeh, M ; Fazeli, M ; Sharif University of Technology

Abstract

Fast and accurate estimation of soft error rate in VLSI circuits is an essential step in a soft error tolerant ASIC design. In order to have a cost effective protection against radiation effects in combinational logics, an accurate and fast method for identification of most susceptive gates and paths is needed. In this paper, an efficient, fast and accurate method for soft error propagation probability (SEPP) estimation is presented and its performance is evaluated. This method takes into account all three masking factors in multi cycles. It also considers multiple event transients as a new challenge in soft error tolerant VLSI circuit design. Compared with Monte Carlo (MC) simulation-based...

All-optical wavelength-routed architecture for a power-efficient network on chip

, Article IEEE Transactions on Computers ; Vol. 63, issue. 3 , 2014 , p. 777-792 Koohi, S ; Hessabi, S ; Sharif University of Technology

Abstract

In this paper, we propose a new architecture for nanophotonic Networks on Chip (NoC), named 2D-HERT, which consists of optical data and control planes. The proposed data plane is built upon a new topology and all-optical switches that passively route optical data streams based on their wavelengths. Utilizing wavelength routing method, the proposed deterministic routing algorithm, and Wavelength Division Multiplexing (WDM) technique, the proposed data plane eliminates the need for optical resource reservation at the intermediate nodes. For resolving end-point contention, we propose an all-optical request-grant arbitration architecture which reduces optical losses compared to the alternative...