Loading...
Search for: shared-memory
0.005 seconds

    ASHA: An adaptive shared-memory sharing architecture for multi-programmed GPUs

    , Article Microprocessors and Microsystems ; Volume 46 , 2016 , Pages 264-273 ; 01419331 (ISSN) Abbasitabar, H ; Samavatian, M. H ; Sarbazi Azad, H ; Sharif University of Technology
    Elsevier B.V  2016
    Abstract
    Spatial multi-programming is one of the most efficient multi-programming methods on Graphics Processing Units (GPUs). This multi-programming scheme generates variety in resource requirements of stream multiprocessors (SMs) and creates opportunities for sharing unused portions of each SM resource with other SMs. Although this approach drastically improves GPU performance, in some cases it leads to performance degradation due to the shortage of allocated resource to each program. Considering shared-memory as one of the main bottlenecks of thread-level parallelism (TLP), in this paper, we propose an adaptive shared-memory sharing architecture, called ASHA. ASHA enhances spatial... 

    A Reconfigurable and Adaptive Shared-memory Architecture for GPUs

    , M.Sc. Thesis Sharif University of Technology Abbasitabar, Hamed (Author) ; Sarbazi Azad, Hamid (Supervisor)
    Abstract
    The importance of shared memory (scratchpad memory) in GPGPU programming, the memory size limits of GPGPUs and the influence of shared memory on overall performance of the GPGPU has led to its performance optimization. Moreover, the trend of new GPGPUs design shows that the ratio of shared memory to processing elements is going smaller. As a result, the limited capacity of shared memory becomes a bottleneck for a GPU to host a high number of thread blocks, limiting the otherwise available thread-level parallelism (TLP). In this thesis we introduced a reconfigurable and adaptive shared memory architecture for GPGPUs based on resource sharing which can be exploited for throughput improvement... 

    Unifying L1 Data Cache and Shared Memory in GPUs

    , M.Sc. Thesis Sharif University of Technology Yousefzadeh-Asl-Miandoab, Ehsan (Author) ; Sarbazi Azad, Hamid (Supervisor)
    Abstract
    Graphics Processing Units (GPUs) employ a scratch-pad memory (a.k.a., shared memory) in each streaming multiprocessor to accelerate data sharing among the threads in a thread block and provide a software-managed cache for the programmers.However, we observe that about 60% of GPU workloads of several well-known benchmark suites do not use shared memory. Morever, among those workloads that use shared memory, about 42% of shared memory is not utilized, on average. On the other hand, we observe that many general purpose GPU applications suffer from the low hit rate and limited bandwidth of L1 data cache.We aim to use shared memory space and its corrsponding bandwidth for improving L1 data cache,... 

    Evaluating Energy Efficiency and Scalability of Timing Channel- protection Techniques exploited Exploited for Single Chip Cloud Computer

    , M.Sc. Thesis Sharif University of Technology Asgharzadeh Donighi, Ashkan (Author) ; Hesabi, Shahin (Supervisor)
    Abstract
    Although cloud porcessors have lots of benefits, they have brought new challenges for designers; one of these issues, is information leakage through Timing Channel Attack in shared hardware resources. Among these shared resources, main memory controller is less understood. Also applying timing channel protection technquies to shared memory controller, in comparison with other parts such as NoC, caches, etc, can impose high performance overhead to system throughput. Temporal Partitioning (TP) is the baseline secure scheduling algorithm that was proposed for cope with timing channel attack in shared memory controller; but beside this protection, TP compels high performance degradation. In this... 

    Dynamic shared SPM reuse for real-time multicore embedded systems

    , Article ACM Transactions on Architecture and Code Optimization ; Volume 12, Issue 2 , 2015 ; 15443566 (ISSN) Mohajjel Kafshdooz, M ; Ejlali, A ; Sharif University of Technology
    Association for Computing Machinery  2015
    Abstract
    Allocating the scratchpad memory (SPM) space to tasks is a challenging problem in real-time multicore embedded systems that use shared SPM. Proper SPM space allocation is important, as it considerably influences the application worst-case execution time (WCET), which is of great importance in real-time applications. To address this problem, in this article we present a dynamic SPM reuse scheme, where SPM space can be reused by other tasks during runtime without requiring any static SPM partitioning. Although the proposed scheme is applied dynamically at runtime, the required decision making is fairly complex and hence cannot be performed at runtime. We have developed techniques to perform... 

    Reconfigurable multicast routing for Networks on Chip

    , Article Microprocessors and Microsystems ; Volume 42 , 2016 , Pages 180-189 ; 01419331 (ISSN) Nasiri, F ; Sarbazi Azad, H ; Khademzadeh, A ; Sharif University of Technology
    Elsevier 
    Abstract
    Several unicast and multicast routing protocols have been presented for MPSoCs. Multicast protocols in NoCs are used for cache coherency in distributed shared memory systems, replication, barrier synchronization, or clock synchronization. Unicast routing algorithms are not suitable for multicast, as they increase traffic, congestion and deadlock probability. Famous multicast schemes such as tree-based and path-based schemes have been proposed originally for multicomputers and recently adapted to NoCs. In this paper, we propose a switch tree-based multicast scheme, called STBA. This method supports tree construction with a minimum number of routers. Our evaluation results reveal that, for... 

    Assessment of a parallel evolutionary optimization approach for efficient management of coastal aquifers

    , Article Environmental Modelling and Software ; Volume 74 , December , 2015 , Pages 21-38 ; 13648152 (ISSN) Ketabchi, H ; Ataie Ashtiani, B ; Sharif University of Technology
    Elsevier Ltd  2015
    Abstract
    This study presents a parallel evolutionary optimization approach to determine optimal management strategies of large-scale coastal groundwater problems. The population loops of evolutionary algorithms (EA) are parallelized using shared memory parallelism to address the high computational demands of such applications. This methodology is applied to solve the management problems in an aquifer system in Kish Island, Iran using a three-dimensional density-dependent groundwater numerical model. EAs of continuous ant colony optimization (CACO), particle swarm optimization, and genetic algorithm are utilized to solve the optimization problems. By implementing the parallelization strategy, a...