Loading...
Search for: arjomand--m
0.006 seconds
Total 27 records

    High-endurance and performance-efficient design of hybrid cache architectures through adaptive line replacement

    , Article Proceedings of the International Symposium on Low Power Electronics and Design, 1 August 2011 through 3 August 2011 ; August , 2011 , Pages 79-84 ; 15334678 (ISSN) ; 9781612846590 (ISBN) Jadidi, A ; Arjomand, M ; SarbaziAzad, H ; Sharif University of Technology
    2011
    Abstract
    In this paper, we propose a run-time strategy for managing writes onto last level cache in chip multiprocessors where STT-RAM memory is used as baseline technology. To this end, we assume that each cache set is decomposed into limited SRAM lines and large number of STT-RAM lines. SRAM lines are target of frequently-written data and rarely-written or read-only ones are pushed into STT-RAM. As a novel contribution, a low-overhead, fully-hardware technique is utilized to detect write-intensive data blocks of working set and place them into SRAM lines while the remaining data blocks are candidates to be remapped onto STT-RAM blocks during system operation. Therefore, the achieved cache... 

    Prolonging lifetime of PCM-based main memories through on-demand page pairing

    , Article ACM Transactions on Design Automation of Electronic Systems ; Vol. 20, issue. 2 , 1 February , 2015 ; ISSN: 10844309 Asadinia, M ; Arjomand, M ; Sarbazi-Azad, H ; Sharif University of Technology
    Abstract
    With current memory scalability challenges, Phase-Change Memory (PCM) is viewed as an attractive replacement to DRAM. The preliminary concern for PCM applicability is its limited write endurance that results in fast wear-out of memory cells. Worse, process variation in the deep-nanometer regime increases the variation in cell lifetime, resulting in an early and sudden reduction in main memory capacity due to the wear-out of a few cells. Recent studies have proposed redirection or correction schemes to alleviate this problem, but all suffer poor throughput or latency. In this article, we show that one of the inefficiency sources in current schemes, even when wear-leveling algorithms are used,... 

    Design for scalability in enterprise SSDs

    , Article Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT ; 24-27 August , 2014 , p. 417-429 ; ISSN: 1089795X ; ISBN: 9781450328098 Tavakkol, A ; Arjomand, M ; Sarbazi-Azad, H ; Sharif University of Technology
    Abstract
    Solid State Drives (SSDs) have recently emerged as a high speed random access alternative to classical magnetic disks. To date, SSD designs have been largely based on multi-channel bus architecture that confronts serious scalability problems in high-end enterprise SSDs with dozens of flash memory chips and a gigabyte host interface. This forces the community to rapidly change the bus-based inter-flash standards to respond to ever increasing application demands. In this paper, we first give a deep look at how different flash parameters and SSD internal designs affect the actual performance and scalability of the conventional architecture. Our experiments show that SSD performance improvement... 

    Reducing access latency of MLC PCMs through line striping

    , Article Proceedings - International Symposium on Computer Architecture ; Article number 6853228 , 14-18 June , 2014 , p. 277-288 ; ISSN: 10636897 ; ISBN: 9781479943968 Hoseinzadeh, M ; Arjomand, M ; Sarbazi-Azad, H ; Sharif University of Technology
    Abstract
    Although phase change memory with multi-bit storage capability (known as MLC PCM) offers a good combination of high bit-density and non-volatility, its performance is severely impacted by the increased read/write latency. Regarding read operation, access latency increases almost linearly with respect to cell density (the number of bits stored in a cell). Since reads are latency critical, they can seriously impact system performance. This paper alleviates the problem of slow reads in the MLC PCM by exploiting a fundamental property of MLC devices: the Most-Significant Bit (MSB) of MLC cells can be read as fast as SLC cells, while reading the Least-Significant Bits (LSBs) is slower. We propose... 

    Unleashing the potentials of dynamism for page allocation strategies in SSDs

    , Article SIGMETRICS 2014 - Proceedings of the 2014 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems ; 16-20 June , 2014 , pp. 551-552 ; ISBN: 9781450327893 Tavakkol, A ; Arjomand, M ; Sarbazi-Azad, H ; Sharif University of Technology
    Abstract
    In Solid-State Drives (SSDs) with tens of ash chips and highly parallel architecture, we can speed up I/O operations by well-utilizing resources during page allocation. Propos- als already exist for using static page allocation which does not balance the IO load and its efficiency depends on access address patterns. To our best knowledge, there have been no research thus far to show what happens if one or more internal resources can be freely allocated regardless of the request address. This paper explores the possibility of using different degrees of dynamism in page allocation and iden- tifies key design opportunities that they present to improve SSD's characteristics  

    OD3P: On-demand page paired PCM

    , Article Proceedings - Design Automation Conference ; 2-5 June , 2014 , pp. 1-6 ; ISSN: 0738100X ; ISBN: 9781450327305 Asadinia, M ; Arjomand, M ; Sarbazi-Azad, H ; Sharif University of Technology
    Abstract
    With current memory scalability challenges, Phase Change Memory (PCM) is viewed as an attractive replacement to DRAM. The preliminary concern for PCM applicability is its limited write endurance that is highly affected by pro-cess variation in nanometer regime. This increases the vari- ation in cell lifetime resulting in early and sudden reduc- tion in main memory capacity due to wear-out of few cells. When some memory pages reach their endurance limits, other pages may be far from their limits even when using a perfect wear-leveling. Recent studies have proposed redi- rection or correction schemes to alleviate this problem, but all suffer from poor throughput or latency. On contrary, we... 

    Network-on-SSD: A scalable and high-performance communication design paradigm for SSDs

    , Article IEEE Computer Architecture Letters ; Vol. 12, issue 1, Article number 6178186 , 2013 , pp. 5-8 ; ISSN: 15566056 Tavakkol, A ; Arjomand, M ; Sarbazi-Azad, H ; Sharif University of Technology
    Abstract
    In recent years, flash memory solid state disks (SSDs) have shown a great potential to change storage infrastructure because of its advantages of high speed and high throughput random access. This promising storage, however, greatly suffers from performance loss because of frequent ''erase-before-write'' and ''garbage collection'' operations. Thus, novel circuit-level, architectural, and algorithmic techniques are currently explored to address these limitations. In parallel with others, current study investigates replacing shared buses in multi-channel architecture of SSDs with an interconnection network to achieve scalable, high throughput, and reliable SSD storage systems. Roughly... 

    High-endurance and performance-efficient design of hybrid cache architectures through adaptive line replacement

    , Article Proceedings of the International Symposium on Low Power Electronics and Design ; 2011 , p. 79-84 ; ISSN: 15334678 ; ISBN: 9781612846590 Jadidi, A ; Arjomand, M ; Sarbazi-Azad, H ; Sharif University of Technology
    Abstract
    In this paper, we propose a run-time strategy for managing writes onto last level cache in chip multiprocessors where STT-RAM memory is used as baseline technology. To this end, we assume that each cache set is decomposed into limited SRAM lines and large number of STT-RAM lines. SRAM lines are target of frequently-written data and rarely-written or read-only ones are pushed into STT-RAM. As a novel contribution, a low-overhead, fully-hardware technique is utilized to detect write-intensive data blocks of working set and place them into SRAM lines while the remaining data blocks are candidates to be remapped onto STT-RAM blocks during system operation. Therefore, the achieved cache... 

    Multicast-aware mapping algorithm for on-chip networks

    , Article Proceedings - 19th International Euromicro Conference on Parallel, Distributed, and Network-Based Processing, PDP 2011 ; 2011 , p. 455-462 ; ISBN: 9780769543284 Habibi, A ; Arjomand, M ; Sarbazi-Azad, H ; Sharif University of Technology
    Abstract
    Networks-on-Chip (NoCs for short) are known as the most scalable and reliable on-chip communication architectures for multi-core SoCs with tens to hundreds IP cores. Proper mapping the IP cores on NoC tiles (or assigning threads to cores in chip multiprocessors) can reduce end-to-end delay and energy consumption. While almost all previous works on mapping consider higher priority for the application's flows with higher required bandwidth, a mapping strategy, presented in this paper, is introduced that considers multicast communication flows in addition to the normal unicast flows. To this end, multicast and unicast traffic flows are first characterized in terms of some new metrics which are... 

    Unleashing the potentials of dynamism for page allocation strategies in SSDs

    , Article SIGMETRICS 2014 - Proceedings of the 2014 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems ; 2014 , pp. 551-552 ; ISBN: 9781450327893 Tavakkol, A ; Arjomand, M ; Sarbazi-Azad, H ; Sharif University of Technology
    Abstract
    In Solid-State Drives (SSDs) with tens of ash chips and highly parallel architecture, we can speed up I/O operations by well-utilizing resources during page allocation. Propos- als already exist for using static page allocation which does not balance the IO load and its efficiency depends on access address patterns. To our best knowledge, there have been no research thus far to show what happens if one or more internal resources can be freely allocated regardless of the request address. This paper explores the possibility of using different degrees of dynamism in page allocation and iden- tifies key design opportunities that they present to improve SSD's characteristics  

    A reliable 3D MLC PCM architecture with resistance drift predictor

    , Article Proceedings of the International Conference on Dependable Systems and Networks ; 23- 26 June , 2014 , pp. 204-215 ; ISBN: 9781479922338 Jalili, M ; Arjomand, M ; Azad, H. S ; Sharif University of Technology
    Abstract
    In this paper, we study the problem of resistance drift in an MLC Phase Change Memory (PCM) and propose a solution to circumvent its thermally-affected accelerated rate in 3D CMPs. Our scheme is based on the observation that instead of alleviating the problem of resistance drift by using large margins or error correction codes, the PCM read circuit can be reconfigured for tolerating most of the resistance drift errors in a dynamic manner. Through detailed characterization of memory access patterns for 22 applications, we propose an efficient mechanism to facilitate such reliable read scheme via tolerating (a) early-cycle resistance drifts by using narrow margins so that considerably saving... 

    OD3P: On-demand page paired PCM

    , Article Proceedings - Design Automation Conference ; 2014 Asadinia, M ; Arjomand, M ; Sarbazi-Azad, H ; Sharif University of Technology
    Abstract
    With current memory scalability challenges, Phase Change Memory (PCM) is viewed as an attractive replacement to DRAM. The preliminary concern for PCM applicability is its limited write endurance that is highly affected by pro-cess variation in nanometer regime. This increases the vari- ation in cell lifetime resulting in early and sudden reduc- tion in main memory capacity due to wear-out of few cells. When some memory pages reach their endurance limits, other pages may be far from their limits even when using a perfect wear-leveling. Recent studies have proposed redi- rection or correction schemes to alleviate this problem, but all suffer from poor throughput or latency. On contrary, we... 

    Network-on-SSD: A scalable and high-performance communication design paradigm for SSDs

    , Article IEEE Computer Architecture Letters ; Volume 12, Issue 1 , January-June , 2013 , Pages 5-8 ; 15566056 (ISSN) Tavakkol, A ; Arjomand, M ; Sarbazi Azad, H ; Sharif University of Technology
    2013
    Abstract
    In recent years, flash memory olid state disks (SSDs) have shown a great potential to change storage infrastructure because of its advantages of high speed and high throughput random access. This promising storage, however, greatly suffers from performance loss because of frequent ''erase-before-write'' and ''garbage collection'' operations. Thus, novel circuit-level, architectural, and algorithmic techniques are currently explored to address these limitations. In parallel with others, current study investigates replacing shared buses in multi-channel architecture of SSDs with an interconnection network to achieve scalable, high throughput, and reliable SSD storage systems. Roughly speaking,... 

    Multicast-aware mapping algorithm for on-chip networks

    , Article 19th International Euromicro Conference on Parallel, Distributed, and Network-Based Processing, PDP 2011, Ayia Napa, 9 February 2011 through 11 February 2011 ; 2011 , Pages 455-462 ; 9780769543284 (ISBN) Habibi, A ; Arjomand, M ; Sarbazi Azad, H ; Sharif University of Technology
    Abstract
    Networks-on-Chip (NoCs for short) are known as the most scalable and reliable on-chip communication architectures for multi-core SoCs with tens to hundreds IP cores. Proper mapping the IP cores on NoC tiles (or assigning threads to cores in chip multiprocessors) can reduce end-to-end delay and energy consumption. While almost all previous works on mapping consider higher priority for the application's flows with higher required bandwidth, a mapping strategy, presented in this paper, is introduced that considers multicast communication flows in addition to the normal unicast flows. To this end, multicast and unicast traffic flows are first characterized in terms of some new metrics which are... 

    SPCM: The striped phase change memory

    , Article ACM Transactions on Architecture and Code Optimization ; Volume 12, Issue 4 , January , 2015 ; 15443566 (ISSN) Hoseinzadeh, M ; Arjomand, M ; Sarbazi Azad, H ; Sharif University of Technology
    Association for Computing Machinery  2015
    Abstract
    Phase Change Memory (PCM) devices are one of the known promising technologies to take the place of DRAM devices with the aim of overcoming the obstacles of reducing feature size and stopping ever growing amounts of leakage power. In exchange for providing high capacity, high density, and nonvolatility, PCM Multilevel Cells (MLCs) impose high write energy and long latency. Many techniques have been proposed to resolve these side effects. However, read performance issues are usually left behind the great importance of write latency, energy, and lifetime. In this article, we focus on read performance and improve the critical path latency of the main memory system. To this end, we exploit... 

    Prolonging lifetime of PCM-based main memories through on-demand page pairing

    , Article ACM Transactions on Design Automation of Electronic Systems ; Volume 20, Issue 2 , 2015 ; 10844309 (ISSN) Asadinia, M ; Arjomand, M ; Azad, H. S ; Sharif University of Technology
    Association for Computing Machinery  2015
    Abstract
    With current memory scalability challenges, Phase-Change Memory (PCM) is viewed as an attractive replacement to DRAM. The preliminary concern for PCM applicability is its limited write endurance that results in fast wear-out of memory cells. Worse, process variation in the deep-nanometer regime increases the variation in cell lifetime, resulting in an early and sudden reduction in main memory capacity due to the wear-out of a few cells. Recent studies have proposed redirection or correction schemes to alleviate this problem, but all suffer poor throughput or latency. In this article, we show that one of the inefficiency sources in current schemes, even when wear-leveling algorithms are used,... 

    An opto-electrical NoC with traffic flow prediction in chip multiprocessors

    , Article Proceedings - 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP 2014 ; 2014 , Pages 440-443 Ghane, M ; Arjomand, M ; Sarbazi Azad, H ; Sharif University of Technology
    Abstract
    Network-on-Chip (NoC) paradigm has emerged as a revolutionary methodology to integrate numerous IP blocks on a single chip. The achievable performance of adopting NoCs is constrained by the performance limitation mainly imposed by the metal wires that are the physical realization of communication channels. According to the International Technology Roadmap for Semiconductors (ITRS) report, new interconnect paradigms providing huge bandwidth is in need for future products. The current wired channels have limited bandwidth, and consequently, they limit the performance enhancements that NoC architectures can provide. Optical interconnects are capable of achieving better performance via... 

    Sequoia: A high-endurance NVM-Based cache architecture

    , Article IEEE Transactions on Very Large Scale Integration (VLSI) Systems ; Volume 24, Issue 3 , 2016 , Pages 954-967 ; 10638210 (ISSN) Jokar, M. R ; Arjomand, M ; Sarbazi Azad, H ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2016
    Abstract
    Emerging nonvolatile memory technologies, such as spin-transfer torque RAM or resistive RAM, can increase the capacity of the last-level cache (LLC) in a latency and power-efficient manner. These technologies endure 109 - 1012 writes per cell, making a nonvolatile cache (NV-cache) with a lifetime of dozens of years under ideal working conditions. However, nonuniformity in writes to different cache lines considerably reduces the NV-cache lifetime to a few months. Writes to cache lines can be made uniformly by wear-leveling. A suitable wear-leveling for NV-cache should not incur high storage and performance overheads. We propose a novel, simple, and effective wear-leveling technique with... 

    Application-aware deadlock-free oblivious routing based on extended turn-model

    , Article IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD, 7 November 2011 through 10 November 2011, San Jose, CA ; 2011 , Pages 213-218 ; 10923152 (ISSN) ; 9781457713989 (ISBN) Shafiee, A ; Zolghadr, M ; Arjomand, M ; Sarbazi-Azad, H ; Sharif University of Technology
    Abstract
    Programmable hardware is gaining popularity as it can keep pace with growing performance demand in tight power budget, design and test cost, and serious reliability concerns of future multiprocessor embedded systems. Compatible with this trend, Network-on-Chip, as a potential bottleneck of future multi-cores, should also support pro-grammability. Here, we address this issue in design and implementation of routing algorithm for two-dimensional mesh. To this end, we allocate paths based on input traffic pattern and in parallel with customizing routing restriction for deadlock freedom. To achieve this, we propose extended turn model (ETM), a novel parametric deadlock-free routing for 2D meshes... 

    Architecting the last-level cache for GPUs using STT-RAM technology

    , Article Transactions on Design Automation of Electronic Systems ; Volume 20, Issue 4 , 2015 ; 10844309 (ISSN) Samavatian, M. H ; Arjomand, M ; Bashizade, R ; Sarbazi Azad, H ; Sharif University of Technology
    Abstract
    Future GPUs should have larger L2 caches based on the current trends in VLSI technology and GPU architectures toward increase of processing core count. Larger L2 caches inevitably have proportionally larger power consumption. In this article, having investigated the behavior of GPGPU applications, we present an efficient L2 cache architecture for GPUs based on STT-RAM technology. Due to its high-density and low-power characteristics, STT-RAM technology can be utilized in GPUs where numerous cores leave a limited area for on-chip memory banks. They have, however, two important issues, high energy and latency of write operations, that have to be addressed. Low retention time STT-RAMs can...