Loading...
Search for: cycle-accurate-simulation
0.008 seconds

    Implementation-aware model analysis: The case of buffer-throughput tradeoff in streaming applications

    , Article Proceedings of the ACM SIGPLAN Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), 18 June 2015 through 19 June 2015 ; Volume 2015-June , 2015 , Pages 108-117 ; 9781450332576 (ISBN) Mirzazad Barijough, K ; Hashemi, M ; Khibin, V ; Ghiasi, S ; Sharif University of Technology
    Association for Computing Machinery  2015
    Abstract
    Models of computation abstract away a number of implementation details in favor of well-defined semantics. While this has unquestionable benefits, we argue that analysis of models solely based on operational semantics (implementation oblivious analysis) is unfit to drive implementation design space exploration. Specifically, we study the tradeoff between buffer size and streaming throughput in applications modeled as synchronous data flow (SDF) graphs. We demonstrate the inherent inaccuracy of implementation-oblivious approach, which only considers SDF operational semantic. We propose a rigorous transformation, which equips the state of the art buffer-throughput tradeoff analysis technique... 

    Implementation-aware model analysis: The case of buffer-throughput tradeoff in streaming applications

    , Article ACM SIGPLAN Notices ; Volume 50, Issue 5 , May , 2015 , Pages 103-112 ; 15232867 (ISSN) Barijough, K. M ; Hashemi, M ; Khibin, V ; Ghiasi, S ; Sharif University of Technology
    Association for Computing Machinery  2015
    Abstract
    Models of computation abstract away a number of implementation details in favor of well-defined semantics. While this has unquestionable benefits, we argue that analysis of models solely based on operational semantics (implementationoblivious analysis) is unfit to drive implementation design space exploration. Specifically, we study the tradeoff between buffer size and streaming throughput in applications modeled as synchronous data flow (SDF) graphs. We demonstrate the inherent inaccuracy of implementationoblivious approach, which only considers SDF operational semantic. We propose a rigorous transformation, which equips the state of the art buffer-throughput tradeoff analysis technique... 

    A loss aware scalable topology for photonic on chip interconnection networks

    , Article Journal of Supercomputing ; Vol. 68, Issue. 1 , April , 2014 , pp. 106-135 ; ISSN: 1573-0484 (online) Reza, A ; Sarbazi Azad, H ; Khademzadeh, A ; Shabani, H ; Niazmand, B ; Sharif University of Technology
    Abstract
    The demand for robust computation systems has led to the increment of the number of processing cores in current chips. As the number of processing cores increases, current electrical communication means can introduce serious challenges in system performance due to the restrictions in power consumption and communication bandwidth. Contemporary progresses in silicon nano-photonic technology have provided a suitable platform for constructing photonic communication links as an alternative for overcoming such problems. Topology is one of the most significant characteristics of photonic interconnection networks. In this paper, we have introduced a novel topology, aiming to reduce insertion loss in... 

    A markovian performance model for networks-on-chip

    , Article Proceedings of the 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing, PDP 2008, 13 February 2008 through 15 February 2008, Toulouse ; 2008 , Pages 157-164 ; 0769530893 (ISBN); 9780769530895 (ISBN) Kiasari, A. E ; Rahmati, D ; Sarbazi Azad, H ; Hessabi, S ; Sharif University of Technology
    2008
    Abstract
    Network-on-Chip (NoC) has been proposed as a solution for addressing the design challenges of future high-performance nanoscale architectures. Thus, it is of crucial importance for a designer to ha ve access to fast methods for evaluating the performance of on-chip networks. To this end, we present a Markovian model for evaluating the latency and energy consumption of on-chip networks. We compute the a verage delay due to path contention, virtual channel and crossbar switch arbitration using a queuing-based approach, which can capture the blocking phenomena of wormhole switching quite accurately. The model is then used to estimate the power consumption of all routers in NoCs. The performance... 

    Efficient nearest-neighbor data sharing in GPUs

    , Article ACM Transactions on Architecture and Code Optimization ; Volume 18, Issue 1 , 2021 ; 15443566 (ISSN) Nematollahi, N ; Sadrosadati, M ; Falahati, H ; Barkhordar, M ; Drumond, M. P ; Sarbazi Azad, H ; Falsafi, B ; Sharif University of Technology
    Association for Computing Machinery  2021
    Abstract
    Stencil codes (a.k.a. nearest-neighbor computations) are widely used in image processing, machine learning, and scientific applications. Stencil codes incur nearest-neighbor data exchange because the value of each point in the structured grid is calculated as a function of its value and the values of a subset of its nearest-neighbor points. When running on Graphics Processing Unit (GPUs), stencil codes exhibit a high degree of data sharing between nearest-neighbor threads. Sharing is typically implemented through shared memories, shuffle instructions, and on-chip caches and often incurs performance overheads due to the redundancy in memory accesses. In this article, we propose Neighbor Data...