Sharif Digital Repository / Sharif University of Technology / Search result

ISP: Using idle SMs in hardware-based prefetching

, Article Proceedings - 17th CSI International Symposium on Computer Architecture and Digital Systems, CADS 2013 ; October , 2013 , Pages 3-8 ; 9781479905621 (ISBN) Falahati, H ; Abdi, M ; Baniasadi, A ; Hessabi, S ; Computer Society of Iran; IPM ; Sharif University of Technology

IEEE Computer Society 2013

Abstract

The Graphics Processing Unit (GPU) is the most promising candidate platform for faster rate of improvement in peak processing speed, low latency and high performance. The highly programmable and multithreaded nature of GPUs makes them a remarkable candidate for general purpose computing. However, supporting non-graphics computing on graphics processors requires addressing several architecture challenges. In this paper, we focus on improving performance by better hiding long waiting time to transfer data from the slow global memory. Thereupon study an effective light-overhead prefetching mechanism, which utilizes idle processing elements. Our results show that we can potentially improve...

Power-efficient prefetching on GPGPUs

, Article Journal of Supercomputing ; Volume 71, Issue 8 , August , 2015 , pp. 2808-2829 ; ISSN: 09208542 Falahati, H ; Hessabi, S ; Abdi, M ; Baniasadi, A ; Sharif University of Technology

Abstract

The graphics processing unit (GPU) is the most promising candidate platform for achieving faster improvements in peak processing speed, low latency and high performance. The highly programmable and multithreaded nature of GPUs makes them a remarkable candidate for general purpose computing. However, supporting non-graphics computing on graphics processors requires addressing several architectural challenges. In this paper, we focus on improving performance by better hiding long waiting time for transferring data from the slow global memory. Furthermore, we show that the proposed method can reduce power and energy. Reduction in access time to off-chip data has a noticeable role in reducing...

A method for real-time safe navigation in noisy environments

, Article 2013 18th International Conference on Methods and Models in Automation and Robotics, MMAR 2013, Miedzyzdroje ; 2013 , Pages 329-333 ; 9781467355063 (ISBN) Neyshabouri, S. A. S ; Kamali, E ; Niknezhad, M. R ; Monfared, S. S. M. S ; Sharif University of Technology

2013

Abstract

The challenge of finding an optimized and reliable path dates back to emersion of mobile robots. Several approaches have been developed that have partially answered this need. Satisfying results in previous implementations has led to an increased utilization of sampling-based motion planning algorithms in recent years, especially in high degrees of freedom (DOF), fast evolving environments. Another advantage of these algorithms is their probabilistic completeness that guarantees delivery of a path in sufficient time, if one exists. On the other hand, sampling based motion planners leave no comment on safety of the planned path. This paper suggests biasing the Rapidly-exploring Random Trees...

GPU implementation of split-field finite difference time-domain method for drudelorentz dispersive media

, Article Progress in Electromagnetics Research ; Volume 125 , 2012 , Pages 55-77 ; 10704698 (ISSN) Shahmansouri, A ; Rashidian, B ; Sharif University of Technology

2012

Abstract

Split-field finite-difference time-domain (SF-FDTD) method can overcome the limitation of ordinary FDTD in analyzing periodic structures under oblique incidence. On the other hand, huge run times of 3D SF-FDTD, is practically a major burden in its usage for analysis and design of nanostructures, particularly when having dispersive media. Here, details of parallel implementation of 3D SF-FDTD method for dispersive media, combined with totalfield/ scattered-field (TF/SF) method for injecting oblique plane wave, are discussed. Graphics processing unit (GPU) has been used for this purpose, and very large speed up factors have been achieved. Also a previously reported formulation of SF-FDTD based...

Cluster-based approach for improving graphics processing unit performance by inter streaming multiprocessors locality

, Article IET Computers and Digital Techniques ; Volume 9, Issue 5 , August , 2015 , Pages 275-282 ; 17518601 (ISSN) Keshtegar, M. M ; Falahati, H ; Hessabi, S ; Sharif University of Technology

Institution of Engineering and Technology 2015

Abstract

Owing to a new platform for high performance and general-purpose computing, graphics processing unit (GPU) is one of the most promising candidates for faster improvement in peak processing speed, low latency and high performance. As GPUs employ multithreading to hide latency, there is a small private data cache in each single instruction multiple thread (SIMT) core. Hence, these cores communicate in many applications through the global memory. Access to this public memory takes long time and consumes large amount of power. Moreover, the memory bandwidth is limited which is quite challenging in parallel processing. The missed memory requests in last level cache that are followed by accesses...

Efficient nearest-neighbor data sharing in GPUs

, Article ACM Transactions on Architecture and Code Optimization ; Volume 18, Issue 1 , 2021 ; 15443566 (ISSN) Nematollahi, N ; Sadrosadati, M ; Falahati, H ; Barkhordar, M ; Drumond, M. P ; Sarbazi Azad, H ; Falsafi, B ; Sharif University of Technology

Association for Computing Machinery 2021

Abstract

Stencil codes (a.k.a. nearest-neighbor computations) are widely used in image processing, machine learning, and scientific applications. Stencil codes incur nearest-neighbor data exchange because the value of each point in the structured grid is calculated as a function of its value and the values of a subset of its nearest-neighbor points. When running on Graphics Processing Unit (GPUs), stencil codes exhibit a high degree of data sharing between nearest-neighbor threads. Sharing is typically implemented through shared memories, shuffle instructions, and on-chip caches and often incurs performance overheads due to the redundancy in memory accesses. In this article, we propose Neighbor Data...

Simulation of the multi-purpose gamma irradiator dose distribution based on the GEANT4 and GPU system

, Article Journal of Instrumentation ; Volume 16, Issue 7 , 2021 ; 17480221 (ISSN) Razimanesh, M ; Hosseini, S. A ; Sharif University of Technology

IOP Publishing Ltd 2021

Abstract

Gamma irradiation systems are used extensively in the industry in order to sterilize medical devices, disinfect hygienic products and increase the shelf life of agricultural products. The method of gamma irradiation is superior to the older methods of heat or chemical treatment because it is by far a simpler operation. In this method, only one parameter, the exposure time is controlled, whereas in the other mentioned methods five or six different parameters need to be controlled. The design of irradiation systems generally includes the size and the location of products, and the arrangement of source rack pencils. In order to optimize the design of the gamma irradiation systems, it is needed...