Loading...
Search for: memory-architecture
0.005 seconds
Total 51 records

    Reducing access latency of MLC PCMs through line striping

    , Article Proceedings - International Symposium on Computer Architecture ; Article number 6853228 , 14-18 June , 2014 , p. 277-288 ; ISSN: 10636897 ; ISBN: 9781479943968 Hoseinzadeh, M ; Arjomand, M ; Sarbazi-Azad, H ; Sharif University of Technology
    Abstract
    Although phase change memory with multi-bit storage capability (known as MLC PCM) offers a good combination of high bit-density and non-volatility, its performance is severely impacted by the increased read/write latency. Regarding read operation, access latency increases almost linearly with respect to cell density (the number of bits stored in a cell). Since reads are latency critical, they can seriously impact system performance. This paper alleviates the problem of slow reads in the MLC PCM by exploiting a fundamental property of MLC devices: the Most-Significant Bit (MSB) of MLC cells can be read as fast as SLC cells, while reading the Least-Significant Bits (LSBs) is slower. We propose... 

    ASHA: An adaptive shared-memory sharing architecture for multi-programmed GPUs

    , Article Microprocessors and Microsystems ; Volume 46 , 2016 , Pages 264-273 ; 01419331 (ISSN) Abbasitabar, H ; Samavatian, M. H ; Sarbazi Azad, H ; Sharif University of Technology
    Elsevier B.V  2016
    Abstract
    Spatial multi-programming is one of the most efficient multi-programming methods on Graphics Processing Units (GPUs). This multi-programming scheme generates variety in resource requirements of stream multiprocessors (SMs) and creates opportunities for sharing unused portions of each SM resource with other SMs. Although this approach drastically improves GPU performance, in some cases it leads to performance degradation due to the shortage of allocated resource to each program. Considering shared-memory as one of the main bottlenecks of thread-level parallelism (TLP), in this paper, we propose an adaptive shared-memory sharing architecture, called ASHA. ASHA enhances spatial... 

    PF-DRAM: A precharge-free DRAM structure

    , Article 48th ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2021, 14 June 2021 through 19 June 2021 ; Volume 2021-June , 2021 , Pages 126-138 ; 10636897 (ISSN); 9781665433334 (ISBN) Rohbani, N ; Darabi, S ; Sarbazi Azad, H ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2021
    Abstract
    Although DRAM capacity and bandwidth have increased sharply by the advances in technology and standards, its latency and energy per access have remained almost constant in recent generations. The main portion of DRAM power/energy is dissipated by Read, Write, and Refresh operations, all initiated by a Precharge phase. Precharge phase not only imposes a large amount of energy consumption, but also increases the delay of closing a row in a memory block to open another one. By reduction of row-hit rate in recent workloads, especially in multi-core systems, precharge rate increases which exacerbates DRAM power dissipation and access latency. This work proposes a novel DRAM structure, called... 

    A case for PIM support in general-purpose compilers

    , Article IEEE Design and Test ; 2021 ; 21682356 (ISSN) Sadeghi, P ; Ejlali, A ; Sharif University of Technology
    IEEE Computer Society  2021
    Abstract
    Newly developed 3D die stacking technologies affords us the possibility to revisit the idea of Processing-in-Memory (PIM) as implementation hurdles are overcome. We now have the opportunity to offload the data intensive parts of our program to the PIM in form of kernels to be able to take advantage of the high internal bandwidth of the memory modules. Memory access latency and bandwidth are two major bottlenecks in today’s high-performance computers and new use-cases are moving faster than ever before towards this mode of computing. With new graph processing and neural network applications being developed every day, having a performance model of such systems helps in predicting the behavior... 

    High-endurance and performance-efficient design of hybrid cache architectures through adaptive line replacement

    , Article Proceedings of the International Symposium on Low Power Electronics and Design ; 2011 , p. 79-84 ; ISSN: 15334678 ; ISBN: 9781612846590 Jadidi, A ; Arjomand, M ; Sarbazi-Azad, H ; Sharif University of Technology
    Abstract
    In this paper, we propose a run-time strategy for managing writes onto last level cache in chip multiprocessors where STT-RAM memory is used as baseline technology. To this end, we assume that each cache set is decomposed into limited SRAM lines and large number of STT-RAM lines. SRAM lines are target of frequently-written data and rarely-written or read-only ones are pushed into STT-RAM. As a novel contribution, a low-overhead, fully-hardware technique is utilized to detect write-intensive data blocks of working set and place them into SRAM lines while the remaining data blocks are candidates to be remapped onto STT-RAM blocks during system operation. Therefore, the achieved cache... 

    PSP-Cache: A low-cost fault-tolerant cache memory architecture

    , Article Proceedings -Design, Automation and Test in Europe, DATE ; 2014 ; ISSN: 15301591 ; ISBN: 9783981537024 Farbeh, H ; Miremadi, S. G ; Sharif University of Technology
    Abstract
    Cache memories constitute a large fraction of processor chip area and are highly vulnerable to soft errors caused by energetic particles. To protect these memories, most of the modern processors employ Error Detection Codes (EDCs) or Error Correction Codes (ECCs). EDCs/ECCs impose significant overheads in terms of area and energy; these overheads increase as a function of interleaving EDCs/ECCs to detect/correct multiple errors. This paper proposes a new cache architecture to minimize the area and energy overheads of EDCs/ECCs in set-associative L1-caches. Simulation results for a 4-way set-associative cache show that the proposed architecture reduces both the area and static power overheads... 

    Emerging non-volatile memory technologies for future low power reconfigurable systems

    , Article 9th International Symposium on Reconfigurable and Communication-Centric Systems-on-Chip, ReCoSoC ; 26-28 May , 2014 , pp. 1-2 ; 9781479958108 Ahari, A ; Asadi, H ; Tahoori, M. B ; Sharif University of Technology
    Abstract
    Non-volatile memory (NVM) technologies are promising alternatives to traditional CMOS memory technologies. While NVMs were primarily studied to be used in the memory hierarchy, they can also provide benefits in reconfigurable systems such as Field-Programmable Gate Arrays (FPGAs). In this paper, we investigate the applicability of different NVM technologies for the configuration bits of FPGAs and propose a power-efficient reconfigurable architecture based on Phase Change Memory (PCM). Quantitative analysis for various FPGA architectures using different memory technologies shows the benefits of the proposed scheme  

    High-endurance and performance-efficient design of hybrid cache architectures through adaptive line replacement

    , Article Proceedings of the International Symposium on Low Power Electronics and Design, 1 August 2011 through 3 August 2011 ; August , 2011 , Pages 79-84 ; 15334678 (ISSN) ; 9781612846590 (ISBN) Jadidi, A ; Arjomand, M ; SarbaziAzad, H ; Sharif University of Technology
    2011
    Abstract
    In this paper, we propose a run-time strategy for managing writes onto last level cache in chip multiprocessors where STT-RAM memory is used as baseline technology. To this end, we assume that each cache set is decomposed into limited SRAM lines and large number of STT-RAM lines. SRAM lines are target of frequently-written data and rarely-written or read-only ones are pushed into STT-RAM. As a novel contribution, a low-overhead, fully-hardware technique is utilized to detect write-intensive data blocks of working set and place them into SRAM lines while the remaining data blocks are candidates to be remapped onto STT-RAM blocks during system operation. Therefore, the achieved cache... 

    In-scratchpad memory replication: Protecting scratchpad memories in multicore embedded systems against soft errors

    , Article ACM Transactions on Design Automation of Electronic Systems ; Volume 20, Issue 4 , 2015 ; 10844309 (ISSN) Delshadtehrani, L ; Farbeh, H ; Miremadi, S. G ; Sharif University of Technology
    Association for Computing Machinery  2015
    Abstract
    Scratchpad memories (SPMs) are widely employed inmulticore embedded processors. Reliability is one of the major constraints in the embedded processor design, which is threatened with the increasing susceptibility of memory cells to multiple-bit upsets (MBUs) due to continuous technology down-scaling. This article proposes a low-cost and efficient data replication mechanism, called In-Scratchpad Memory Replication (ISMR), to correct MBUs in SPMs of multicore embedded processors. The main feature of ISMR is a smart controller, called Replication Management Unit (RMU), which is responsible for dynamically analyzing the activity of the SPM blocks at runtime and efficiently replicating the... 

    Architecting the last-level cache for GPUs using STT-RAM technology

    , Article Transactions on Design Automation of Electronic Systems ; Volume 20, Issue 4 , 2015 ; 10844309 (ISSN) Samavatian, M. H ; Arjomand, M ; Bashizade, R ; Sarbazi Azad, H ; Sharif University of Technology
    Abstract
    Future GPUs should have larger L2 caches based on the current trends in VLSI technology and GPU architectures toward increase of processing core count. Larger L2 caches inevitably have proportionally larger power consumption. In this article, having investigated the behavior of GPGPU applications, we present an efficient L2 cache architecture for GPUs based on STT-RAM technology. Due to its high-density and low-power characteristics, STT-RAM technology can be utilized in GPUs where numerous cores leave a limited area for on-chip memory banks. They have, however, two important issues, high energy and latency of write operations, that have to be addressed. Low retention time STT-RAMs can... 

    Floating-ECC: dynamic repositioning of error correcting code bits for extending the lifetime of STT-RAM caches

    , Article IEEE Transactions on Computers ; Volume 65, Issue 12 , 2016 , Pages 3661-3675 ; 00189340 (ISSN) Farbeh, H ; Kim, H ; Miremadi, S. G ; Kim, S ; Sharif University of Technology
    IEEE Computer Society  2016
    Abstract
    Spin-Transfer Torque RAM (STT-RAM) is a promising alternative to SRAM for implementing on-chip L2 and L3 caches. One of the most critical challenges in STT-RAM is reliability due to limited write endurance, which results in insufficient lifetime, as well as various types of errors. Previous studies have focused on either presenting various cache architectures/management techniques to improve the lifetime of STT-RAM caches or utilizing different Error Correcting Codes (ECCs) to protect against the permanent and transient errors. However, there is no quantitative analysis in the literature to determine the impact of ECCs on the lifetime of the STT-RAM caches. This paper formulates this impact... 

    A fine-grained configurable cache architecture for soft processors

    , Article 18th CSI International Symposium on Computer Architecture and Digital Systems, 7 October 2015 through 8 October 2015 ; 2015 ; 9781467380232 (ISBN) Biglari, M ; Mirzazad Barijough, K ; Goudarzi, M ; Pourmohseni, B ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc 
    Abstract
    The ever increasing density and performance of FPGAS, has increased the importance and popularity of soft processors. The growing gap between the speed of processors and memories can partly be compensated through memory hierarchy. Since memory accesses follow a non-uniform distribution, and vary from one application to another, variable set-associative cache architectures have emerged. In this paper, a novel cache architecture, primarily aimed at soft processors, is proposed to address the variable access demands of applications, through dynamically configurable line-associativity, with no memory overhead. The FPGA implementation of the proposed architecture achieves an average miss count... 

    Fast write operations in non-volatile memories using latency masking

    , Article CSI International Symposium on Real-Time and Embedded Systems and Technologies, RTEST 2018, 9 May 2018 through 10 May 2018 ; 2018 , Pages 1-7 ; 9781538614754 (ISBN) Hoseinghorban, A ; Bazzaz, M ; Ejlali, A ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2018
    Abstract
    Energy consumption is an important issue in designing embedded systems and the emerging Internet of Things (IoT). The use of non-volatile memories instead of SRAM in these systems improves their energy consumption since non-volatile memories consume much less leakage power and provide better capacity given the same die area as SRAM. However, this can impose significant performance overhead because the write operation latency of non-volatile memories is more than that of SRAM. In this paper we presented an NVM-based data memory architecture for embedded systems which improves the performance of the system at the cost of a slight energy consumption overhead. The architecture employs... 

    ReCA: An efficient reconfigurable cache architecture for storage systems with online workload characterization

    , Article IEEE Transactions on Parallel and Distributed Systems ; Volume 29, Issue 7 , 2018 , Pages 1605-1620 ; 10459219 (ISSN) Salkhordeh, R ; Ebrahimi, S ; Asadi, H ; Sharif University of Technology
    IEEE Computer Society  2018
    Abstract
    In recent years, Solid-State Drives (SSDs) have gained tremendous attention in computing and storage systems due to significant performance improvement over Hard Disk Drives (HDDs). The cost per capacity of SSDs, however, prevents them from entirely replacing HDDs in such systems. One approach to effectively take advantage of SSDs is to use them as a caching layer to store performance critical data blocks in order to reduce the number of accesses to HDD-based disk subsystem. Due to characteristics of Flash-based SSDs such as limited write endurance and long latency on write operations, employing caching algorithms at the Operating System (OS) level necessitates to take such characteristics... 

    AdAM: adaptive approximation management for the non-volatile memory hierarchies

    , Article 018 Design, Automation and Test in Europe Conference and Exhibition, DATE 2018; International Congress Center DresdenDresden ; Volume 2018-January , April , 2018 , Pages 785-790 ; 9783981926316 (ISBN) Teimoori, M. T ; Hanif, M. A ; Ejlali, A ; Shafique, M ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2018
    Abstract
    Existing memory approximation techniques focus on employing approximations at an individual level of the memory hierarchy (e.g., cache, scratchpad, or main memory). However, to exploit the full potential of approximations, there is a need to manage different approximation knobs across the complete memory hierarchy. Towards this, we model a system including STT-RAM scratchpad and PCM main memory with different approximation knobs (e.g., read/write pulse magnitude/duration) in order to synergistically trade data accuracy for both STT-RAM access delay and PCM lifetime by means of an integer linear programming (ILP) problem at design-time. Furthermore, a runtime algorithm is proposed to... 

    Improved MPC algorithms for edit distance and ulam distance

    , Article 31st ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2019, 22 June 2019 through 24 June 2019 ; 2019 , Pages 31-40 ; 9781450361842 (ISBN) Boroujeni, M ; Seddighin, S ; Sharif University of Technology
    Association for Computing Machinery  2019
    Abstract
    Edit distance is one of the most fundamental problems in combinatorial optimization. Ulam distance is a special case of edit distance where no character is allowed to appear more than once in a string. Recent developments have been very fruitful for obtaining fast and parallel algorithms for both edit distance and Ulam distance. In this work, we present an almost optimal MPC algorithm for Ulam distance and improve MPC algorithms for edit distance. Our algorithm for Ulam distance is optimal in the sense that (1) the approximation factor of our algorithm is 1 + ϵ, (2) the round complexity of our algorithm is constant, (3) the total memory of our algorithm is almost linear (OH(n)), and (4)] the... 

    Bingo spatial data prefetcher

    , Article 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019, 16 February 2019 through 20 February 2019 ; 2019 , Pages 399-411 ; 9781728114446 (ISBN) Bakhshalipour, M ; Shakerinava, M ; Lotfi Kamran, P ; Sarbazi Azad, H ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2019
    Abstract
    Applications extensively use data objects with a regular and fixed layout, which leads to the recurrence of access patterns over memory regions. Spatial data prefetching techniques exploit this phenomenon to prefetch future memory references and hide the long latency of DRAM accesses. While state-of-the-art spatial data prefetchers are effective at reducing the number of data misses, we observe that there is still significant room for improvement. To select an access pattern for prefetching, existing spatial prefetchers associate observed access patterns to either a short event with a high probability of recurrence or a long event with a low probability of recurrence. Consequently, the... 

    A table-based application-specific prefetch engine for object-oriented embedded systems

    , Article 2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation, IC-SAMOS 2006, Samos, 17 July 2006 through 20 July 2006 ; 2006 , Pages 7-13 ; 1424401550 (ISBN); 9781424401550 (ISBN) Hessabi, S ; Modarressi, M ; Goudarzi, M ; Javanhemmat, H ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2006
    Abstract
    A table-based application-specific data prefetching mechanism is presented in this paper. This mechanism is proposed to improve the performance of the application specific instruction-set processors (ASIP) we develop customized to an object-oriented application. In this approach, we divide the data accesses of a class method into two conditional and unconditional parts. We supply the prefetch engine with the static information about each part to prefetch all data fields of an object required by a class method when the class method is invoked. Effective management of memory access patterns by dividing them based on the method to which they belong and storing the access information of nested... 

    Dynamic iranian sign language recognition using an optimized deep neural network: An implementation via a robotic-based architecture

    , Article International Journal of Social Robotics ; 2021 ; 18754791 (ISSN) Basiri, S ; Taheri, A ; Meghdari, A. F ; Boroushaki, M ; Alemi, M ; Sharif University of Technology
    Springer Science and Business Media B.V  2021
    Abstract
    Sign language is a non-verbal communication tool used by the deaf. A robust sign language recognition framework is needed to develop Human–Robot Interaction (HRI) platforms that are able to interact with humans via sign language. Iranian sign language (ISL) is composed of both static postures and dynamic gestures of the hand and fingers. In this paper, we present a robust framework using a Deep Neural Network (DNN) to recognize dynamic ISL gestures captured by motion capture gloves in Real-Time. To this end, first, a dataset of fifteen ISL classes was collected in time series; then, this dataset was virtually augmented and pre-processed using the “state-image” method to produce a unique... 

    High-Performance predictable NVM-Based instruction memory for real-time embedded systems

    , Article IEEE Transactions on Emerging Topics in Computing ; Volume 9, Issue 1 , 2021 , Pages 441-455 ; 21686750 (ISSN) Bazzaz, M ; Hoseinghorban, A ; Poursafaei, F ; Ejlali, A ; Sharif University of Technology
    IEEE Computer Society  2021
    Abstract
    Worst case execution time and energy consumption are two of the most important design constraints of real-time embedded systems and memory subsystem has a major impact on both of them. Therefore, many recent studies have tried to improve the memory subsystem of embedded systems by using emerging non-volatile memories instead of conventional memories such as SRAM and DRAM. Indeed, the low leakage power dissipation and improved density of emerging non-volatile memories make them prime candidates for replacing the conventional memories. However, accessing these memories imposes performance and energy overhead and using them as the instruction memory could increase the worst case execution time,...