    Reducing access latency of MLC PCMs through line striping

    , Article Proceedings - International Symposium on Computer Architecture ; Article number 6853228 , 14-18 June , 2014 , p. 277-288 ; ISSN: 10636897 ; ISBN: 9781479943968 Hoseinzadeh, M ; Arjomand, M ; Sarbazi-Azad, H ; Sharif University of Technology
    Although phase change memory with multi-bit storage capability (known as MLC PCM) offers a good combination of high bit-density and non-volatility, its performance is severely impacted by the increased read/write latency. Regarding read operation, access latency increases almost linearly with respect to cell density (the number of bits stored in a cell). Since reads are latency critical, they can seriously impact system performance. This paper alleviates the problem of slow reads in the MLC PCM by exploiting a fundamental property of MLC devices: the Most-Significant Bit (MSB) of MLC cells can be read as fast as SLC cells, while reading the Least-Significant Bits (LSBs) is slower. We propose... 

    A case for PIM support in general-purpose compilers

    , Article IEEE Design and Test ; 2021 ; 21682356 (ISSN) Sadeghi, P ; Ejlali, A ; Sharif University of Technology
    IEEE Computer Society  2021
    Newly developed 3D die stacking technologies affords us the possibility to revisit the idea of Processing-in-Memory (PIM) as implementation hurdles are overcome. We now have the opportunity to offload the data intensive parts of our program to the PIM in form of kernels to be able to take advantage of the high internal bandwidth of the memory modules. Memory access latency and bandwidth are two major bottlenecks in today’s high-performance computers and new use-cases are moving faster than ever before towards this mode of computing. With new graph processing and neural network applications being developed every day, having a performance model of such systems helps in predicting the behavior... 

    Highly concurrent latency-tolerant register files for GPUs

    , Article ACM Transactions on Computer Systems ; Volume 37, Issue 1-4 , 2021 ; 07342071 (ISSN) Sadrosadati, M ; Mirhosseini, A ; Hajiabadi, A ; Ehsani, S. B ; Falahati, H ; Sarbazi Azad, H ; Drumond, M ; Falsafi, B ; Ausavarungnirun, R ; Mutlu, O ; Sharif University of Technology
    Association for Computing Machinery  2021
    Graphics Processing Units (GPUs) employ large register files to accommodate all active threads and accelerate context switching. Unfortunately, register files are a scalability bottleneck for future GPUs due to long access latency, high power consumption, and large silicon area provisioning. Prior work proposes hierarchical register file to reduce the register file power consumption by caching registers in a smaller register file cache. Unfortunately, this approach does not improve register access latency due to the low hit rate in the register file cache. In this article, we propose the Latency-Tolerant Register File (LTRF) architecture to achieve low latency in a two-level hierarchical... 

    Exploiting Imprecise Non-volatile Memories for Soft Real-time Embedded Systems to Achieve Low Energy Consumption

    , M.Sc. Thesis Sharif University of Technology Bahrami, Fahimeh (Author) ; Ejlali, Alireza (Supervisor)
    Spin-Transfer-Torque-RAM (STT-RAM) has recently been widely accepted as a promising replacement for SRAM technology through the technology scaling due to its high density, zero standby power and comparable-to-SRAM read access latency. However, there are two major obstacles to use STT-RAM, namely, high write access latency and energy. In this study, we propose two approaches to solve these challenges in embedded systems. The conventional latency of the STT-RAM write operation is 10ns and lowering the write latency causes the required write current exponentially increase, leading to a larger memory cell area and a shorter memory lifetime. As the first proposed method, we have assessed the...