Sharif Digital Repository / Sharif University of Technology / Search result

Reducing access latency of MLC PCMs through line striping

, Article Proceedings - International Symposium on Computer Architecture ; Article number 6853228 , 14-18 June , 2014 , p. 277-288 ; ISSN: 10636897 ; ISBN: 9781479943968 Hoseinzadeh, M ; Arjomand, M ; Sarbazi-Azad, H ; Sharif University of Technology

Abstract

Although phase change memory with multi-bit storage capability (known as MLC PCM) offers a good combination of high bit-density and non-volatility, its performance is severely impacted by the increased read/write latency. Regarding read operation, access latency increases almost linearly with respect to cell density (the number of bits stored in a cell). Since reads are latency critical, they can seriously impact system performance. This paper alleviates the problem of slow reads in the MLC PCM by exploiting a fundamental property of MLC devices: the Most-Significant Bit (MSB) of MLC cells can be read as fast as SLC cells, while reading the Least-Significant Bits (LSBs) is slower. We propose...

High-endurance and performance-efficient design of hybrid cache architectures through adaptive line replacement

, Article Proceedings of the International Symposium on Low Power Electronics and Design ; 2011 , p. 79-84 ; ISSN: 15334678 ; ISBN: 9781612846590 Jadidi, A ; Arjomand, M ; Sarbazi-Azad, H ; Sharif University of Technology

Abstract

In this paper, we propose a run-time strategy for managing writes onto last level cache in chip multiprocessors where STT-RAM memory is used as baseline technology. To this end, we assume that each cache set is decomposed into limited SRAM lines and large number of STT-RAM lines. SRAM lines are target of frequently-written data and rarely-written or read-only ones are pushed into STT-RAM. As a novel contribution, a low-overhead, fully-hardware technique is utilized to detect write-intensive data blocks of working set and place them into SRAM lines while the remaining data blocks are candidates to be remapped onto STT-RAM blocks during system operation. Therefore, the achieved cache...

PSP-Cache: A low-cost fault-tolerant cache memory architecture

, Article Proceedings -Design, Automation and Test in Europe, DATE ; 2014 ; ISSN: 15301591 ; ISBN: 9783981537024 Farbeh, H ; Miremadi, S. G ; Sharif University of Technology

Abstract

Cache memories constitute a large fraction of processor chip area and are highly vulnerable to soft errors caused by energetic particles. To protect these memories, most of the modern processors employ Error Detection Codes (EDCs) or Error Correction Codes (ECCs). EDCs/ECCs impose significant overheads in terms of area and energy; these overheads increase as a function of interleaving EDCs/ECCs to detect/correct multiple errors. This paper proposes a new cache architecture to minimize the area and energy overheads of EDCs/ECCs in set-associative L1-caches. Simulation results for a 4-way set-associative cache show that the proposed architecture reduces both the area and static power overheads...

Emerging non-volatile memory technologies for future low power reconfigurable systems

, Article 9th International Symposium on Reconfigurable and Communication-Centric Systems-on-Chip, ReCoSoC ; 26-28 May , 2014 , pp. 1-2 ; 9781479958108 Ahari, A ; Asadi, H ; Tahoori, M. B ; Sharif University of Technology

Abstract

Non-volatile memory (NVM) technologies are promising alternatives to traditional CMOS memory technologies. While NVMs were primarily studied to be used in the memory hierarchy, they can also provide benefits in reconfigurable systems such as Field-Programmable Gate Arrays (FPGAs). In this paper, we investigate the applicability of different NVM technologies for the configuration bits of FPGAs and propose a power-efficient reconfigurable architecture based on Phase Change Memory (PCM). Quantitative analysis for various FPGA architectures using different memory technologies shows the benefits of the proposed scheme

High-endurance and performance-efficient design of hybrid cache architectures through adaptive line replacement

, Article Proceedings of the International Symposium on Low Power Electronics and Design, 1 August 2011 through 3 August 2011 ; August , 2011 , Pages 79-84 ; 15334678 (ISSN) ; 9781612846590 (ISBN) Jadidi, A ; Arjomand, M ; SarbaziAzad, H ; Sharif University of Technology

2011

Abstract

In this paper, we propose a run-time strategy for managing writes onto last level cache in chip multiprocessors where STT-RAM memory is used as baseline technology. To this end, we assume that each cache set is decomposed into limited SRAM lines and large number of STT-RAM lines. SRAM lines are target of frequently-written data and rarely-written or read-only ones are pushed into STT-RAM. As a novel contribution, a low-overhead, fully-hardware technique is utilized to detect write-intensive data blocks of working set and place them into SRAM lines while the remaining data blocks are candidates to be remapped onto STT-RAM blocks during system operation. Therefore, the achieved cache...

Adaptive characterisation of a human hand model during intercations with a telemanipulation system

, Article International Conference on Robotics and Mechatronics, ICROM 2015, 7 October 2015 through 9 October 2015 ; 2015 , Pages 688-693 ; 9781467372343 (ISBN) Esfandiari, M ; Sadeghnejad, S ; Farahmand, F ; Vosoughi, G ; Sharif University of Technology

2015

Abstract

Proper modeling of the human arm dynamic, as it interacts with telemanipulation and haptic systems, is important in enhancing the transparency of these systems. In this article, we introduced an adaptive identifier to estimate the impedance characteristic of a human operator as it interacts with a single translational degree of freedom mechanism. The five parameter model, including an extra spring and damper for a better approximation of the dynamic behavior of human arm, has been used. Since the impedance characteristic of human arm differs from one individual to another, it is important to estimate these parameters for each individual and update the controller to enhance the transparency...

Cluster-based approach for improving graphics processing unit performance by inter streaming multiprocessors locality

, Article IET Computers and Digital Techniques ; Volume 9, Issue 5 , August , 2015 , Pages 275-282 ; 17518601 (ISSN) Keshtegar, M. M ; Falahati, H ; Hessabi, S ; Sharif University of Technology

Institution of Engineering and Technology 2015

Abstract

Owing to a new platform for high performance and general-purpose computing, graphics processing unit (GPU) is one of the most promising candidates for faster improvement in peak processing speed, low latency and high performance. As GPUs employ multithreading to hide latency, there is a small private data cache in each single instruction multiple thread (SIMT) core. Hence, these cores communicate in many applications through the global memory. Access to this public memory takes long time and consumes large amount of power. Moreover, the memory bandwidth is limited which is quite challenging in parallel processing. The missed memory requests in last level cache that are followed by accesses...

In-scratchpad memory replication: Protecting scratchpad memories in multicore embedded systems against soft errors

, Article ACM Transactions on Design Automation of Electronic Systems ; Volume 20, Issue 4 , 2015 ; 10844309 (ISSN) Delshadtehrani, L ; Farbeh, H ; Miremadi, S. G ; Sharif University of Technology

Association for Computing Machinery 2015

Abstract

Scratchpad memories (SPMs) are widely employed inmulticore embedded processors. Reliability is one of the major constraints in the embedded processor design, which is threatened with the increasing susceptibility of memory cells to multiple-bit upsets (MBUs) due to continuous technology down-scaling. This article proposes a low-cost and efficient data replication mechanism, called In-Scratchpad Memory Replication (ISMR), to correct MBUs in SPMs of multicore embedded processors. The main feature of ISMR is a smart controller, called Replication Management Unit (RMU), which is responsible for dynamically analyzing the activity of the SPM blocks at runtime and efficiently replicating the...

Dynamic shared SPM reuse for real-time multicore embedded systems

, Article ACM Transactions on Architecture and Code Optimization ; Volume 12, Issue 2 , 2015 ; 15443566 (ISSN) Mohajjel Kafshdooz, M ; Ejlali, A ; Sharif University of Technology

Association for Computing Machinery 2015

Abstract

Allocating the scratchpad memory (SPM) space to tasks is a challenging problem in real-time multicore embedded systems that use shared SPM. Proper SPM space allocation is important, as it considerably influences the application worst-case execution time (WCET), which is of great importance in real-time applications. To address this problem, in this article we present a dynamic SPM reuse scheme, where SPM space can be reused by other tasks during runtime without requiring any static SPM partitioning. Although the proposed scheme is applied dynamically at runtime, the required decision making is fairly complex and hence cannot be performed at runtime. We have developed techniques to perform...

Application-based dynamic reconfiguration in optical network-on-chip

, Article Computers and Electrical Engineering ; Volume 45 , July , 2015 , Pages 417-429 ; 00457906 (ISSN) Falahati, H ; Koohi, S ; Hessabi, S ; Sharif University of Technology

Elsevier Ltd 2015

Abstract

We propose a new optical reconfigurable Network-on-Chip (NoC), named ReFaT ONoC (Reconfigurable Flat and Tree Optical NoC). ReFaT is a dynamically reconfigurable architecture, which customizes the topology and routing paths based on the application characteristics. ReFaT, as an all-optical NoC, routes optical packets based on their wavelengths. For this purpose, we propose a novel architecture for the optical switch, which eliminates the need for optical resource reservation, and thus avoids the corresponding latency and area overheads. As a key idea for dynamic reconfiguration, each application is mapped to a specific set of wavelengths and utilizes its dedicated routing algorithm. We...

Architecting the last-level cache for GPUs using STT-RAM technology

, Article Transactions on Design Automation of Electronic Systems ; Volume 20, Issue 4 , 2015 ; 10844309 (ISSN) Samavatian, M. H ; Arjomand, M ; Bashizade, R ; Sarbazi Azad, H ; Sharif University of Technology

Abstract

Future GPUs should have larger L2 caches based on the current trends in VLSI technology and GPU architectures toward increase of processing core count. Larger L2 caches inevitably have proportionally larger power consumption. In this article, having investigated the behavior of GPGPU applications, we present an efficient L2 cache architecture for GPUs based on STT-RAM technology. Due to its high-density and low-power characteristics, STT-RAM technology can be utilized in GPUs where numerous cores leave a limited area for on-chip memory banks. They have, however, two important issues, high energy and latency of write operations, that have to be addressed. Low retention time STT-RAMs can...

A novel pipeline architecture of replacing ink drop spread

, Article Proceedings - 2010 2nd World Congress on Nature and Biologically Inspired Computing, NaBIC 2010, 15 December 2010 through 17 December 2010, Kitakyushu ; 2010 , Pages 127-133 ; 9781424473762 (ISBN) Firouzi, M ; Bagheri Shouraki, S ; Tabandeh, M ; Mousavi, H. R ; Sharif University of Technology

2010

Abstract

Human Brain is one of the most wonderful and complex systems which is designed for ever; A huge complex network composed of neurons as tiny biological and chemical processors which are distributed and work together as a super parallel system to do control and vital activities of human body. Brain learning simulation and hardware implementation is one of the most interesting research areas in order to make artificial brain. One of the researches in this area is Active Learning Method in brief ALM. ALM is an adaptive recursive fuzzy learning algorithm based on brain functionality and specification which models a complex Multi Input Multi Output System as a fuzzy combination of Single Input...

A morphable phase change memory architecture considering frequent zero values

, Article Proceedings - IEEE International Conference on Computer Design: VLSI in Computers and Processors ; 2011 , Pages 373-380 ; 10636404 (ISSN) ; 9781457719523 (ISBN) Arjomand, M ; Jadidi, A ; Shafiee, A ; Sarbazi Azad, H ; Sharif University of Technology

Abstract

Phase Change Memory (PCM) is emerging as a high-dense and power-efficient choice for future main memory systems. While PCM cell size is marching towards minimum achievable feature size, recent prototypes effectively improve device scalability by storing multiple bits per each cell. Unfortunately, Multi-Level Cell (MLC) PCM devices offer higher access time and energy when compared to Single-Level Cell (SLC) counterparts making it difficult to incorporate MLC in main memory. To address this challenge, we proposes Zero-value-based Morphable PCM, ZM-PCM for short, a novel MLC-PCM main memory architecture which tries incorporating benefits of both MLC and SLC devices within the same structure....

Floating-ECC: dynamic repositioning of error correcting code bits for extending the lifetime of STT-RAM caches

, Article IEEE Transactions on Computers ; Volume 65, Issue 12 , 2016 , Pages 3661-3675 ; 00189340 (ISSN) Farbeh, H ; Kim, H ; Miremadi, S. G ; Kim, S ; Sharif University of Technology

IEEE Computer Society 2016

Abstract

Spin-Transfer Torque RAM (STT-RAM) is a promising alternative to SRAM for implementing on-chip L2 and L3 caches. One of the most critical challenges in STT-RAM is reliability due to limited write endurance, which results in insufficient lifetime, as well as various types of errors. Previous studies have focused on either presenting various cache architectures/management techniques to improve the lifetime of STT-RAM caches or utilizing different Error Correcting Codes (ECCs) to protect against the permanent and transient errors. However, there is no quantitative analysis in the literature to determine the impact of ECCs on the lifetime of the STT-RAM caches. This paper formulates this impact...

A cache-assisted scratchpad memory for multiple-bit-error correction

, Article IEEE Transactions on Very Large Scale Integration (VLSI) Systems ; Volume 24, Issue 11 , 2016 , Pages 3296-3309 ; 10638210 (ISSN) Farbeh, H ; Sadat Mirzadeh, N ; Farhady Ghalaty, N ; Miremadi, S. G ; Fazeli, M ; Asadi, H ; Sharif University of Technology

Institute of Electrical and Electronics Engineers Inc 2016

Abstract

Scratchpad memory (SPM) is widely used in modern embedded processors to overcome the limitations of cache memory. The high vulnerability of SPM to soft errors, however, limits its usage in safety-critical applications. This paper proposes an efficient fault-tolerant scheme, called cache-assisted duplicated SPM (CADS), to protect SPM against soft errors. The main aim of CADS is to utilize cache memory to provide a replica for SPM lines. Using cache memory, CADS is able to guarantee a full duplication of all SPM lines. We also further enhance the proposed scheme by presenting buffered CADS (BCADS) that significantly improves the CADS energy efficiency. BCADS is compared with two well-known...

A fine-grained configurable cache architecture for soft processors

, Article 18th CSI International Symposium on Computer Architecture and Digital Systems, 7 October 2015 through 8 October 2015 ; 2015 ; 9781467380232 (ISBN) Biglari, M ; Mirzazad Barijough, K ; Goudarzi, M ; Pourmohseni, B ; Sharif University of Technology

Institute of Electrical and Electronics Engineers Inc

Abstract

The ever increasing density and performance of FPGAS, has increased the importance and popularity of soft processors. The growing gap between the speed of processors and memories can partly be compensated through memory hierarchy. Since memory accesses follow a non-uniform distribution, and vary from one application to another, variable set-associative cache architectures have emerged. In this paper, a novel cache architecture, primarily aimed at soft processors, is proposed to address the variable access demands of applications, through dynamically configurable line-associativity, with no memory overhead. The FPGA implementation of the proposed architecture achieves an average miss count...

ASHA: An adaptive shared-memory sharing architecture for multi-programmed GPUs

, Article Microprocessors and Microsystems ; Volume 46 , 2016 , Pages 264-273 ; 01419331 (ISSN) Abbasitabar, H ; Samavatian, M. H ; Sarbazi Azad, H ; Sharif University of Technology

Elsevier B.V 2016

Abstract

Spatial multi-programming is one of the most efficient multi-programming methods on Graphics Processing Units (GPUs). This multi-programming scheme generates variety in resource requirements of stream multiprocessors (SMs) and creates opportunities for sharing unused portions of each SM resource with other SMs. Although this approach drastically improves GPU performance, in some cases it leads to performance degradation due to the shortage of allocated resource to each program. Considering shared-memory as one of the main bottlenecks of thread-level parallelism (TLP), in this paper, we propose an adaptive shared-memory sharing architecture, called ASHA. ASHA enhances spatial...

A hybrid Non-Volatile Cache Design for Solid-State Drives using comprehensive I/O characterization

, Article IEEE Transactions on Computers ; Volume 65, Issue 6 , 2016 , Pages 1678-1691 ; 00189340 (ISSN) Tarihi, M ; Asadi, H ; Haghdoost, A ; Arjomand, M ; Sarbazi Azad, H ; Sharif University of Technology

IEEE Computer Society

Abstract

The emergence of new memory technologies provides us with opportunity to enhance the properties of existing memory architectures. One such technology is Phase Change Memory (PCM) which boasts superior scalability, power savings, non-volatility, and a performance competitive to Dynamic Random Access Memory (DRAM). In this paper, we propose a write buffer architecture for Solid-State Drives (SSDs) which attempts to exploit PCM as a DRAM alternative while alleviating its issues such as long write latency, high write energy, and finite endurance. To this end and based on thorough I/O characterization of desktop and enterprise applications, we propose a hybrid DRAM-PCM SSD cache design with an...

Reconfigurable multicast routing for Networks on Chip

, Article Microprocessors and Microsystems ; Volume 42 , 2016 , Pages 180-189 ; 01419331 (ISSN) Nasiri, F ; Sarbazi Azad, H ; Khademzadeh, A ; Sharif University of Technology

Elsevier

Abstract

Several unicast and multicast routing protocols have been presented for MPSoCs. Multicast protocols in NoCs are used for cache coherency in distributed shared memory systems, replication, barrier synchronization, or clock synchronization. Unicast routing algorithms are not suitable for multicast, as they increase traffic, congestion and deadlock probability. Famous multicast schemes such as tree-based and path-based schemes have been proposed originally for multicomputers and recently adapted to NoCs. In this paper, we propose a switch tree-based multicast scheme, called STBA. This method supports tree construction with a minimum number of routers. Our evaluation results reveal that, for...

An operating system level data migration scheme in hybrid DRAM-NVM memory architecture

, Article Proceedings of the 2016 Design, Automation and Test in Europe Conference and Exhibition, DATE 2016, 14 March 2016 through 18 March 2016 ; 2016 , Pages 936-941 ; 9783981537062 (ISBN) Salkhordeh, R ; Asadi, H ; Sharif University of Technology

Institute of Electrical and Electronics Engineers Inc 2016

Abstract

With the emergence of Non-Volatile Memories (NVMs) and their shortcomings such as limited endurance and high power consumption in write requests, several studies have suggested hybrid memory architecture employing both Dynamic Random Access Memory (DRAM) and NVM in a memory system. By conducting a comprehensive experiments, we have observed that such studies lack to consider very important aspects of hybrid memories including the effect of: a) data migrations on performance, b) data migrations on power, and c) the granularity of data migration. This paper presents an efficient data migration scheme at the Operating System level in a hybrid DRAM-NVM memory architecture. In the proposed...