DiskAccel: Accelerating disk-based experiments by representative sampling

, Article Performance Evaluation Review, 15 June 2015 through 19 June 2015 ; Volume 43, Issue 1 , 2015 , Pages 297-308 ; 01635999 (ISSN) Tarihi, M ; Asadi, H ; Sarbazi Azad, H ; Sharif University of Technology

Association for Computing Machinery 2015

Abstract

Disk traces are typically used to analyze real-life workloads and for replay-based evaluations. This approach benefits from capturing important details such as varying behavior patterns, bursty activity, and diurnal patterns of system activity, which are often missing from the behavior of workload synthesis tools. However, accurate capture of such details requires recording traces containing long durations of system activity, which are difficult to use for replay-based evaluation. One way of solving the problem of long storage trace duration is the use of disk simulators. While publicly available disk simulators can greatly accelerate experiments, they have not kept up with technological...

Optimizing pipelines of trigonometric functions for FPGAs

, Article 2007 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, PACRIM, Victoria, BC, 22 August 2007 through 24 August 2007 ; 2007 , Pages 105-108 ; 1424411904 (ISBN); 9781424411900 (ISBN) Ajorloo, H ; Ebrahimi, H ; Sarbazi Azad, H ; Sharif University of Technology

2007

Abstract

Trigonometric functions are one of the most applicable functions in digital signal processing. In this paper, we propose two approaches for optimizing pipeline implementation of the CORDIC algorithm and compare it with other previous approaches. The proposed solutions are implemented on one of the Xilinx Virtex family's FPGAs. Our simulation results show that for high input bits, our approach is preferable to other existing approaches. ©2007 IEEE

Efficient genetic based topological mapping using analytical models for on-chip networks

, Article Journal of Computer and System Sciences ; Volume 79, Issue 4 , 2013 , Pages 492-513 ; 00220000 (ISSN) Arjomand, M ; Amiri, S. H ; Sarbazi Azad, H ; Sharif University of Technology

2013

Abstract

Network-on-Chips are now the popular communication medium to support inter-IP communications in complex on-chip systems with tens to hundreds IP cores. Higher scalability (compared to the traditional shared bus and point-to-point interconnects), throughput, and reliability are among the most important advantages of NoCs. Moreover, NoCs can well match current CAD methodologies mainly relying on modular and reusable structures with regularity of structural pattern. However, since NoCs are resource-limited, determining how to distribute application load over limited on-chip resources (e.g. switches, buffers, virtual channels, and wires) in order to improve the metrics of interest and satisfy...

ASHA: An adaptive shared-memory sharing architecture for multi-programmed GPUs

, Article Microprocessors and Microsystems ; Volume 46 , 2016 , Pages 264-273 ; 01419331 (ISSN) Abbasitabar, H ; Samavatian, M. H ; Sarbazi Azad, H ; Sharif University of Technology

Elsevier B.V 2016

Abstract

Spatial multi-programming is one of the most efficient multi-programming methods on Graphics Processing Units (GPUs). This multi-programming scheme generates variety in resource requirements of stream multiprocessors (SMs) and creates opportunities for sharing unused portions of each SM resource with other SMs. Although this approach drastically improves GPU performance, in some cases it leads to performance degradation due to the shortage of allocated resource to each program. Considering shared-memory as one of the main bottlenecks of thread-level parallelism (TLP), in this paper, we propose an adaptive shared-memory sharing architecture, called ASHA. ASHA enhances spatial...

Design for scalability in enterprise SSDs

, Article Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT ; 24-27 August , 2014 , p. 417-429 ; ISSN: 1089795X ; ISBN: 9781450328098 Tavakkol, A ; Arjomand, M ; Sarbazi-Azad, H ; Sharif University of Technology

2014

Abstract

Solid State Drives (SSDs) have recently emerged as a high speed random access alternative to classical magnetic disks. To date, SSD designs have been largely based on multi-channel bus architecture that confronts serious scalability problems in high-end enterprise SSDs with dozens of flash memory chips and a gigabyte host interface. This forces the community to rapidly change the bus-based inter-flash standards to respond to ever increasing application demands. In this paper, we first give a deep look at how different flash parameters and SSD internal designs affect the actual performance and scalability of the conventional architecture. Our experiments show that SSD performance improvement...

Prolonging lifetime of PCM-based main memories through on-demand page pairing

, Article ACM Transactions on Design Automation of Electronic Systems ; Vol. 20, issue. 2 , 1 February , 2015 ; ISSN: 10844309 Asadinia, M ; Arjomand, M ; Sarbazi-Azad, H ; Sharif University of Technology

2015

Abstract

With current memory scalability challenges, Phase-Change Memory (PCM) is viewed as an attractive replacement to DRAM. The preliminary concern for PCM applicability is its limited write endurance that results in fast wear-out of memory cells. Worse, process variation in the deep-nanometer regime increases the variation in cell lifetime, resulting in an early and sudden reduction in main memory capacity due to the wear-out of a few cells. Recent studies have proposed redirection or correction schemes to alleviate this problem, but all suffer poor throughput or latency. In this article, we show that one of the inefficiency sources in current schemes, even when wear-leveling algorithms are used,...

Reducing access latency of MLC PCMs through line striping

, Article Proceedings - International Symposium on Computer Architecture ; Article number 6853228 , 14-18 June , 2014 , p. 277-288 ; ISSN: 10636897 ; ISBN: 9781479943968 Hoseinzadeh, M ; Arjomand, M ; Sarbazi-Azad, H ; Sharif University of Technology

2014

Abstract

Although phase change memory with multi-bit storage capability (known as MLC PCM) offers a good combination of high bit-density and non-volatility, its performance is severely impacted by the increased read/write latency. Regarding read operation, access latency increases almost linearly with respect to cell density (the number of bits stored in a cell). Since reads are latency critical, they can seriously impact system performance. This paper alleviates the problem of slow reads in the MLC PCM by exploiting a fundamental property of MLC devices: the Most-Significant Bit (MSB) of MLC cells can be read as fast as SLC cells, while reading the Least-Significant Bits (LSBs) is slower. We propose...

Unleashing the potentials of dynamism for page allocation strategies in SSDs

, Article SIGMETRICS 2014 - Proceedings of the 2014 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems ; 16-20 June , 2014 , pp. 551-552 ; ISBN: 9781450327893 Tavakkol, A ; Arjomand, M ; Sarbazi-Azad, H ; Sharif University of Technology

2014

Abstract

In Solid-State Drives (SSDs) with tens of ash chips and highly parallel architecture, we can speed up I/O operations by well-utilizing resources during page allocation. Propos- als already exist for using static page allocation which does not balance the IO load and its efficiency depends on access address patterns. To our best knowledge, there have been no research thus far to show what happens if one or more internal resources can be freely allocated regardless of the request address. This paper explores the possibility of using different degrees of dynamism in page allocation and iden- tifies key design opportunities that they present to improve SSD's characteristics

Using task migration to improve non-contiguous processor allocation in NoC-based CMPs

, Article Journal of Systems Architecture ; Vol. 59, issue. 7 , 2013 , pp. 468-481 ; ISSN: 13837621 Modarressi, M ; Asadinia, M ; Sarbazi-Azad, H ; Sharif University of Technology

2013

Abstract

In this paper, a processor allocation mechanism for NoC-based chip multiprocessors is presented. Processor allocation is a well-known problem in parallel computer systems and aims to allocate the processing nodes of a multiprocessor to different tasks of an input application at run time. The proposed mechanism targets optimizing the on-chip communication power/latency and relies on two procedures: processor allocation and task migration. Allocation is done by a fast heuristic algorithm to allocate the free processors to the tasks of an incoming application when a new application begins execution. The task-migration algorithm is activated when some application completes execution and frees up...

OD3P: On-demand page paired PCM

, Article Proceedings - Design Automation Conference ; 2-5 June , 2014 , pp. 1-6 ; ISSN: 0738100X ; ISBN: 9781450327305 Asadinia, M ; Arjomand, M ; Sarbazi-Azad, H ; Sharif University of Technology

2014

Abstract

With current memory scalability challenges, Phase Change Memory (PCM) is viewed as an attractive replacement to DRAM. The preliminary concern for PCM applicability is its limited write endurance that is highly affected by pro-cess variation in nanometer regime. This increases the vari- ation in cell lifetime resulting in early and sudden reduc- tion in main memory capacity due to wear-out of few cells. When some memory pages reach their endurance limits, other pages may be far from their limits even when using a perfect wear-leveling. Recent studies have proposed redi- rection or correction schemes to alleviate this problem, but all suffer from poor throughput or latency. On contrary, we...

Network-on-SSD: A scalable and high-performance communication design paradigm for SSDs

, Article IEEE Computer Architecture Letters ; Vol. 12, issue 1, Article number 6178186 , 2013 , pp. 5-8 ; ISSN: 15566056 Tavakkol, A ; Arjomand, M ; Sarbazi-Azad, H ; Sharif University of Technology

2013

Abstract

In recent years, flash memory solid state disks (SSDs) have shown a great potential to change storage infrastructure because of its advantages of high speed and high throughput random access. This promising storage, however, greatly suffers from performance loss because of frequent ''erase-before-write'' and ''garbage collection'' operations. Thus, novel circuit-level, architectural, and algorithmic techniques are currently explored to address these limitations. In parallel with others, current study investigates replacing shared buses in multi-channel architecture of SSDs with an interconnection network to achieve scalable, high throughput, and reliable SSD storage systems. Roughly...

Multicast-aware mapping algorithm for on-chip networks

, Article Proceedings - 19th International Euromicro Conference on Parallel, Distributed, and Network-Based Processing, PDP 2011 ; 2011 , p. 455-462 ; ISBN: 9780769543284 Habibi, A ; Arjomand, M ; Sarbazi-Azad, H ; Sharif University of Technology

2011

Abstract

Networks-on-Chip (NoCs for short) are known as the most scalable and reliable on-chip communication architectures for multi-core SoCs with tens to hundreds IP cores. Proper mapping the IP cores on NoC tiles (or assigning threads to cores in chip multiprocessors) can reduce end-to-end delay and energy consumption. While almost all previous works on mapping consider higher priority for the application's flows with higher required bandwidth, a mapping strategy, presented in this paper, is introduced that considers multicast communication flows in addition to the normal unicast flows. To this end, multicast and unicast traffic flows are first characterized in terms of some new metrics which are...

Evaluation and design of beaconing in mobile wireless networks

, Article Ad Hoc Networks ; Vol. 9, issue. 3 , 2011 , p. 368-386 ; ISSN: 15708705 Nayebi, A ; Karlsson, G ; Sarbazi-Azad, H ; Sharif University of Technology

2011

Abstract

One of the intrinsic problems of mobility in wireless networks is the discovery of mobile nodes. A widely used solution for this problem is to use different variations of beacons, such as hello packets. Although a poorly designed beaconing scheme may lead to unnecessary energy usage or poor throughput, a systematic approach to analyze and select beaconing parameters is not provided in the literature. Here, we propose a model to study the beaconing efficiency using some measures such as the link lifetime, the probability of link establishment, and the delay to discover a new neighbor. The model is general and does not adhere to any particular mobility model; the only input from the mobility...

Virtual point-to-point connections for NoCs

, Article IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems ; Vol. 29, issue. 6 , 2010 , p. 855-868 ; ISSN: 02780070 Modarressi, M ; Tavakkol, A ; Sarbazi-Azad, H ; Sharif University of Technology

2010

Abstract

In this paper, we aim to improve the performance and power metrics of packet-switched network-on-chips (NoCs) and benefits from the scalability and resource utilization advantages of NoCs and superior communication performance of point-to-point dedicated links. The proposed method sets up the virtual point-to-point (VIP) connections over one virtual channel (which bypasses the entire router pipeline) at each physical channel of the NoC. We present two schemes for constructing such VIP circuits. In the first scheme, the circuits are constructed for an application based on its task-graph at design time. The second scheme addresses constructing the connections at run-time using a light-weight...

Special issue on network-based high performance computing

, Article Journal of Supercomputing ; 2010 , p. 1-4 ; ISSN: 09208542 Sarbazi-Azad, H ; Shahrabi, A ; Beigy, H ; Sharif University of Technology

2010

Abstract

[No abstract available]

Supporting non-contiguous processor allocation in mesh-based chip multiprocessors using virtual point-to-point links

, Article IET Computers and Digital Techniques ; Vol. 6, issue. 5 , September , 2012 , pp. 302-317 ; ISSN: 17518601 Asadinia, M ; Modarressi, M ; Sarbazi-Azad, H ; Sharif University of Technology

2012

Abstract

In this study, the authors propose a processor allocation mechanism for run-time assignment of a set of communicating tasks of input applications onto the processing nodes of a chip multiprocessor, when the arrival order and execution lifetime of the input applications are not known a priori. This mechanism targets the on-chip communication and aims to reduce the power and latency of the network-on-chip employed as the communication infrastructure. In this work, the authors benefit from the advantages of non-contiguous processor allocation mechanisms, by allowing the tasks of the input application mapped onto disjoint regions (submeshes) and then virtually connecting them by bypassing the...

High-endurance and performance-efficient design of hybrid cache architectures through adaptive line replacement

, Article Proceedings of the International Symposium on Low Power Electronics and Design ; 2011 , p. 79-84 ; ISSN: 15334678 ; ISBN: 9781612846590 Jadidi, A ; Arjomand, M ; Sarbazi-Azad, H ; Sharif University of Technology

2011

Abstract

In this paper, we propose a run-time strategy for managing writes onto last level cache in chip multiprocessors where STT-RAM memory is used as baseline technology. To this end, we assume that each cache set is decomposed into limited SRAM lines and large number of STT-RAM lines. SRAM lines are target of frequently-written data and rarely-written or read-only ones are pushed into STT-RAM. As a novel contribution, a low-overhead, fully-hardware technique is utilized to detect write-intensive data blocks of working set and place them into SRAM lines while the remaining data blocks are candidates to be remapped onto STT-RAM blocks during system operation. Therefore, the achieved cache...

Power and performance efficient partial circuits in packet-switched networks-on-chip

, Article Proceedings of the 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP 2013 ; 27 February - 1 March , 2013 , pp. 509-513 ; Print ISBN: 9781467353212 Teimouri, N ; Modarressi, M ; Sarbazi-Azad, H ; Sharif University of Technology

2013

Abstract

In this paper, we propose a hybrid packet-circuit switching for networks-on-chip to benefit from the advantages of both switching mechanisms. Integrating circuit and packet switching into a single NoC is achieved by partitioning the link bandwidth and router data-path and control-path elements into two parts and allocating each part to one of the switching methods. In this NoC, during injection in the source node, packets are initially forwarded on the packet-switched sub-network, but keep requesting a circuit towards the destination node. The circuit-switched part, at each cycle, collects the circuit construction requests, performs arbitration among the conflicting requests, and constructs...

Unleashing the potentials of dynamism for page allocation strategies in SSDs

, Article SIGMETRICS 2014 - Proceedings of the 2014 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems ; 2014 , pp. 551-552 ; ISBN: 9781450327893 Tavakkol, A ; Arjomand, M ; Sarbazi-Azad, H ; Sharif University of Technology

2014

Abstract

In Solid-State Drives (SSDs) with tens of ash chips and highly parallel architecture, we can speed up I/O operations by well-utilizing resources during page allocation. Propos- als already exist for using static page allocation which does not balance the IO load and its efficiency depends on access address patterns. To our best knowledge, there have been no research thus far to show what happens if one or more internal resources can be freely allocated regardless of the request address. This paper explores the possibility of using different degrees of dynamism in page allocation and iden- tifies key design opportunities that they present to improve SSD's characteristics

OD3P: On-demand page paired PCM

, Article Proceedings - Design Automation Conference ; 2014 Asadinia, M ; Arjomand, M ; Sarbazi-Azad, H ; Sharif University of Technology

2014

Abstract

With current memory scalability challenges, Phase Change Memory (PCM) is viewed as an attractive replacement to DRAM. The preliminary concern for PCM applicability is its limited write endurance that is highly affected by pro-cess variation in nanometer regime. This increases the vari- ation in cell lifetime resulting in early and sudden reduc- tion in main memory capacity due to wear-out of few cells. When some memory pages reach their endurance limits, other pages may be far from their limits even when using a perfect wear-leveling. Recent studies have proposed redi- rection or correction schemes to alleviate this problem, but all suffer from poor throughput or latency. On contrary, we...