Sharif Digital Repository / Sharif University of Technology / Search result

ASHA: An adaptive shared-memory sharing architecture for multi-programmed GPUs

, Article Microprocessors and Microsystems ; Volume 46 , 2016 , Pages 264-273 ; 01419331 (ISSN) Abbasitabar, H ; Samavatian, M. H ; Sarbazi Azad, H ; Sharif University of Technology

Elsevier B.V 2016

Abstract

Spatial multi-programming is one of the most efficient multi-programming methods on Graphics Processing Units (GPUs). This multi-programming scheme generates variety in resource requirements of stream multiprocessors (SMs) and creates opportunities for sharing unused portions of each SM resource with other SMs. Although this approach drastically improves GPU performance, in some cases it leads to performance degradation due to the shortage of allocated resource to each program. Considering shared-memory as one of the main bottlenecks of thread-level parallelism (TLP), in this paper, we propose an adaptive shared-memory sharing architecture, called ASHA. ASHA enhances spatial...

A Reconfigurable and Adaptive Shared-memory Architecture for GPUs

, M.Sc. Thesis Sharif University of Technology Abbasitabar, Hamed (Author) ; Sarbazi Azad, Hamid (Supervisor)

Abstract

The importance of shared memory (scratchpad memory) in GPGPU programming, the memory size limits of GPGPUs and the influence of shared memory on overall performance of the GPGPU has led to its performance optimization. Moreover, the trend of new GPGPUs design shows that the ratio of shared memory to processing elements is going smaller. As a result, the limited capacity of shared memory becomes a bottleneck for a GPU to host a high number of thread blocks, limiting the otherwise available thread-level parallelism (TLP). In this thesis we introduced a reconfigurable and adaptive shared memory architecture for GPGPUs based on resource sharing which can be exploited for throughput improvement...

محتواي کتاب

Unifying L1 Data Cache and Shared Memory in GPUs

, M.Sc. Thesis Sharif University of Technology Yousefzadeh-Asl-Miandoab, Ehsan (Author) ; Sarbazi Azad, Hamid (Supervisor)

Abstract

Graphics Processing Units (GPUs) employ a scratch-pad memory (a.k.a., shared memory) in each streaming multiprocessor to accelerate data sharing among the threads in a thread block and provide a software-managed cache for the programmers.However, we observe that about 60% of GPU workloads of several well-known benchmark suites do not use shared memory. Morever, among those workloads that use shared memory, about 42% of shared memory is not utilized, on average. On the other hand, we observe that many general purpose GPU applications suffer from the low hit rate and limited bandwidth of L1 data cache.We aim to use shared memory space and its corrsponding bandwidth for improving L1 data cache,...

محتواي کتاب

Evaluating Energy Efficiency and Scalability of Timing Channel- protection Techniques exploited Exploited for Single Chip Cloud Computer

, M.Sc. Thesis Sharif University of Technology Asgharzadeh Donighi, Ashkan (Author) ; Hesabi, Shahin (Supervisor)

Abstract

Although cloud porcessors have lots of benefits, they have brought new challenges for designers; one of these issues, is information leakage through Timing Channel Attack in shared hardware resources. Among these shared resources, main memory controller is less understood. Also applying timing channel protection technquies to shared memory controller, in comparison with other parts such as NoC, caches, etc, can impose high performance overhead to system throughput. Temporal Partitioning (TP) is the baseline secure scheduling algorithm that was proposed for cope with timing channel attack in shared memory controller; but beside this protection, TP compels high performance degradation. In this...

محتواي کتاب

Dynamic shared SPM reuse for real-time multicore embedded systems

, Article ACM Transactions on Architecture and Code Optimization ; Volume 12, Issue 2 , 2015 ; 15443566 (ISSN) Mohajjel Kafshdooz, M ; Ejlali, A ; Sharif University of Technology

Association for Computing Machinery 2015

Abstract

Allocating the scratchpad memory (SPM) space to tasks is a challenging problem in real-time multicore embedded systems that use shared SPM. Proper SPM space allocation is important, as it considerably influences the application worst-case execution time (WCET), which is of great importance in real-time applications. To address this problem, in this article we present a dynamic SPM reuse scheme, where SPM space can be reused by other tasks during runtime without requiring any static SPM partitioning. Although the proposed scheme is applied dynamically at runtime, the required decision making is fairly complex and hence cannot be performed at runtime. We have developed techniques to perform...

Reconfigurable multicast routing for Networks on Chip

, Article Microprocessors and Microsystems ; Volume 42 , 2016 , Pages 180-189 ; 01419331 (ISSN) Nasiri, F ; Sarbazi Azad, H ; Khademzadeh, A ; Sharif University of Technology

Elsevier

Abstract

Several unicast and multicast routing protocols have been presented for MPSoCs. Multicast protocols in NoCs are used for cache coherency in distributed shared memory systems, replication, barrier synchronization, or clock synchronization. Unicast routing algorithms are not suitable for multicast, as they increase traffic, congestion and deadlock probability. Famous multicast schemes such as tree-based and path-based schemes have been proposed originally for multicomputers and recently adapted to NoCs. In this paper, we propose a switch tree-based multicast scheme, called STBA. This method supports tree construction with a minimum number of routers. Our evaluation results reveal that, for...

Assessment of a parallel evolutionary optimization approach for efficient management of coastal aquifers

, Article Environmental Modelling and Software ; Volume 74 , December , 2015 , Pages 21-38 ; 13648152 (ISSN) Ketabchi, H ; Ataie Ashtiani, B ; Sharif University of Technology

Elsevier Ltd 2015

Abstract

This study presents a parallel evolutionary optimization approach to determine optimal management strategies of large-scale coastal groundwater problems. The population loops of evolutionary algorithms (EA) are parallelized using shared memory parallelism to address the high computational demands of such applications. This methodology is applied to solve the management problems in an aquifer system in Kish Island, Iran using a three-dimensional density-dependent groundwater numerical model. EAs of continuous ant colony optimization (CACO), particle swarm optimization, and genetic algorithm are utilized to solve the optimization problems. By implementing the parallelization strategy, a...