Loading...
Search for: gpu
0.007 seconds
Total 38 records

    GPU-based parallel algorithm for computing point visibility inside simple polygons

    , Article Computers and Graphics (Pergamon) ; Volume 49 , 2015 , Pages 1-9 ; 00978493 (ISSN) Shoja, E ; Ghodsi, M ; Sharif University of Technology
    Elsevier Ltd  2015
    Abstract
    Given a simple polygon P in the plane, we present a parallel algorithm for computing the visibility polygon of an observer point q inside P. We use chain visibility concept and a bottom-up merge method for constructing the visibility polygon of point q. The algorithm is simple and mainly designed for GPU architectures, where it runs in O(logn) time using O(n) processors. This is the first work on designing a GPU-based parallel algorithm for the visibility problem. To the best of our knowledge, the presented algorithm is also the first suboptimal parallel algorithm for the visibility problem that can be implemented on existing parallel architectures. We evaluated a sample implementation of... 

    Efficient Implementation of Compressed Deep Convolutional Neural Networks

    , M.Sc. Thesis Sharif University of Technology Afshar, Mohammad (Author) ; Hashemi, Matin (Supervisor)
    Abstract
    Many mobile applications running on smartphones, wearable devices, tiny autonomous robots and IoT devices would potentially benefit from the accuracy and scalability of deep CNN-based machine learning algorithms. However,performance and energy consumption limitations make the execution of such computationally intensive algorithms on embedded mobile devices prohibitive.We present a GPU-accelerated engine, dubbed mCNN, for execution of trained deep CNNs on mobile platforms. The proposed solution takes the trained model as input and automatically optimizes its parallel implementation on the target mobile platform for efficient use of hardware resources such as mobile GPU threads and SIMD units.... 

    NURA: A framework for supporting non-uniform resource accesses in GPUs

    , Article Performance Evaluation Review ; Volume 50, Issue 1 , 2022 , Pages 39-40 ; 01635999 (ISSN) Darabi, S ; Mahani, N ; Baxishi, H ; Yousefzadeh, E ; Sadrosadati, M ; Sarbazi Azad, H ; Sharif University of Technology
    Association for Computing Machinery  2022
    Abstract
    Multi-application execution in Graphics Processing Units (GPUs), a promising way to utilize GPU resources, is still challenging. Some pieces of prior work (e.g. spatial multitasking) have limited opportunity to improve resource utilization, while others, e.g. simultaneous multi-kernel, provide fine-grained resource sharing at the price of unfair execution. This paper proposes a new multi-application paradigm for GPUs, called NURA, that provides high potential to improve resource utilization and ensure fairness and Quality-of-Service(QoS). The key idea is that each streaming multiprocessor (SM) executes the Cooperative Thread Arrays (CTAs) that belong to only one application (similar to... 

    Architecture-aware Implementation of Graph Algorithms based on Linear Algebra in GPUs

    , M.Sc. Thesis Sharif University of Technology Barkhordar, Marzieh (Author) ; Sarbazi Azad, Hamid (Supervisor)
    Abstract
    Processing of large graphs is the key component in many data analytics applications.We model the relationship of entities in different applications, such as web page ranking, social networks and tracking drug interaction with cells, using graphs. Graphics processing unit (GPU) is a well-known accelerator used for graph processing. Unfortunately, there are many challenges for mapping graph applications to GPUs efficiently. As graph applications have more kernel invocations and data transfers, using caches in these applications would be ineffective. Since vertices’ degrees of a graph are different, load distribution in many graph applications is not well balanced.Recently, matrix algebra has... 

    Improving the Efficiency of GPUs by Reducing Register File Accesses

    , M.Sc. Thesis Sharif University of Technology Mohammadpur Fard, Ali (Author) ; Sarbazi Azad, Hamid (Supervisor)
    Abstract
    Graphiⅽs Proⅽessing Units (GPUs) use a ⅼarge register fiⅼe to support a ⅼarge nuⅿber of paraⅼⅼeⅼ threaⅾs, whiⅽh is responsibⅼe for a ⅼarge fraⅽtion of the ⅾeviⅽe’s totaⅼ power ⅽonsuⅿption, anⅾ ⅾie area. Ⅾue to the ⅽonventionaⅼ RISⅭ−ⅼike instruⅽtion set arⅽhiteⅽture, a reasonabⅼy ⅼarge fraⅽtion of aⅼⅼ aⅽⅽesses to the register fiⅼe are perforⅿeⅾ to aⅽⅽoⅿⅿoⅾate the ⅿeⅿory aⅽⅽesses perforⅿeⅾ by the threaⅾs, whiⅽh ⅼiⅿits the avaiⅼabⅼe register fiⅼe banⅾwiⅾth for other ⅽonⅽurrent aⅽⅽesses, anⅾ aⅼso keeps at ⅼeast one register per threaⅾ busy for storing ⅼoaⅾeⅾ vaⅼues. In this thesis, we propose ⅿoving away froⅿ the ⅽonventionaⅼ RISⅭⅼike arⅽhiteⅽture anⅾ aⅼⅼowing ⅿeⅿory operanⅾs for soⅿe... 

    NoC design methodologies for heterogeneous architecture

    , Article Proceedings - 2020 28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2020, 11 March 2020 through 13 March 2020 ; 2020 , Pages 299-306 Alhubail, L ; Jasemi, M ; Bagherzadeh, N ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2020
    Abstract
    Fused CPU-GPU architectures that utilize the powerful features of both processors are common nowadays. Using homogeneous interconnect for such heterogeneous processors can result in performance degradation and power increase. This paper explores the optimization of heterogeneous NoC design to connect heterogeneous CPU-GPU architecture in terms of NoC performance and power. This involves solving four different NoC design sub-problems simultaneously; processing elements (PEs) mapping, buffer size and virtual channel assignments, and links' bandwidth determination. Heuristic-based optimization methods were proposed to obtain a near-optimal heterogeneous NoC design, and formal models were used... 

    NURA: A framework for supporting non-uniform resource accesses in gpus

    , Article Proceedings of the ACM on Measurement and Analysis of Computing Systems ; Volume 6, Issue 1 , 2022 ; 24761249 (ISSN) Darabi, S ; Mahani, N ; Baxishi, H ; Yousefzadeh Asl Miandoab, E ; Sadrosadati, M ; Sarbazi Azad, H ; Sharif University of Technology
    Association for Computing Machinery  2022
    Abstract
    Multi-application execution in Graphics Processing Units (GPUs), a promising way to utilize GPU resources, is still challenging. Some pieces of prior work (e.g., spatial multitasking) have limited opportunity to improve resource utilization, while other works, e.g., simultaneous multi-kernel, provide fine-grained resource sharing at the price of unfair execution. This paper proposes a new multi-application paradigm for GPUs, called NURA, that provides high potential to improve resource utilization and ensures fairness and Quality-of-Service (QoS). The key idea is that each streaming multiprocessor (SM) executes Cooperative Thread Arrays (CTAs) belong to only one application (similar to the... 

    Real-time Implementation of Vision-aided Navigation on GPU

    , M.Sc. Thesis Sharif University of Technology Kamran, Danial (Author) ; Manzuri Shalmani, Mohammad Taghi (Supervisor)
    Abstract
    Knowing the exact position of the robot in real world is one of crucial and important aspects of its navigation process. For this purpose, several inertial sensors such as gyroscope, accelerometer and compass have been used; however, each one of these sensors has its own drawbacks which cause some inaccuracies in some specific situations. Moreover, the Global Positioning System (GPS) is not available in indoor environments and also not accurate in outdoor places. All of these reasons have persuaded researchers to use camera frames captured from the top of robot as new information for estimating motion parameters of the robot. The main challenge for vision aided localization algorithms is... 

    Improving GPGPU Performance Through Efficient use of Memory Controllers

    , M.Sc. Thesis Sharif University of Technology Bakhishi, Mohammad Hazhir (Author) ; Sarbazi Azad, Hamid (Supervisor)
    Abstract
    Appearance of CUDA architecture results in GPU introduction as a suitable platform for parallel processing. Massive usage of GPGPUs in various applications forces the manufacturers to produce this processor with different configurations based on their application demands. But always the design approach observes a ratio between processing capability and bandwidth of GPGPUs. In this way mostly in all GPGPU series the bandwidth increases with growth of GPGPU processing power. At first glance this ratio seems reasonable because more process needs more data. However this approach does not pay attention to the behavior of a wide range of workloads which do not need such a bandwidth. Mentioned... 

    Design and Implementation of Hardware Accelerator for Domain Name Service

    , M.Sc. Thesis Sharif University of Technology Jahandar, Ebrahim (Author) ; Jahangir, Amir Hossein (Supervisor)
    Abstract
    In this project we have designed & implemented a hardware accelerator for domain name service. This hardware accelerator is compatible with existing designs and it could be used standalone as an authoritative DNS server or a hardware accelerator in series of an existing DNS server facility. Two goals, are achieved in this thesis: increasing total DNS throughput and decreasing its response time.In this project, we have surveyed about domain name service, its scientific measurements, theory of caching and its effectiveness, name lookup methods and finally some of the similar designs. Domain name lookup in memory is one of the most challenging operations in every DNS server. We have researched... 

    Numerical Simulation of Bubble Cluster Dynamics, Using Lattice Boltzmann Method

    , Ph.D. Dissertation Sharif University of Technology Daemi, Mahdi (Author) ; Tayyebi Rahni, Mohammad (Supervisor) ; Massah, Hamid Reza (Co-Advisor)
    Abstract
    Bubble cluster has attracted the interests of many researches since the early twentieth century. Despite its easy generation and numerous occurrences, its study is extremely complex. Describing the dynamical behavior of bubble clusters is possible when quite a few simplifying assumptions are utilized. In other words, one can observe that with current approaches, the relevant theoretical researches are not very valuable. In this research, however, lattice Boltzmann method, a rather recent mesoscopic approach, was used to study the behavior of bubbles in a bubble cluster. Of course, this is only the beginning and there is a long way before getting close to experimental results. However, there... 

    Design of Multi-purpose Gamma Irradiator System Based on Combined Monte Carlo Geant4 and GPU Hardware

    , M.Sc. Thesis Sharif University of Technology Razimanesh, Masoud (Author) ; Sohrabpour, Mostafa (Supervisor)
    Abstract
    Gamma irradiation systems are used extensively in the industry in order to sterilize medical devices, disinfect hygienic materials or increase the shelf life of agricultural produce. The method of gamma irradiation is superior to the older methods of heat or chemical treatment because it is by far a simpler operation. In this method only a single parameter of time is controlled whereas in the other mentioned methods five or six different parameters need to be controlled. The design of irradiation systems including the size, location of products and arrangement of source rack pencils. In order to optimize the design it is needed to study the products dose distribution in a wide range of... 

    Parallel Implementation of Telecommunication Decodings in Real-time

    , M.Sc. Thesis Sharif University of Technology Jafarzadeh, Ali (Author) ; Hashemi, Matin (Supervisor)
    Abstract
    Many chip manufacturers have recently introduced high-performance deep-learning hardware accelerators. In modern GPUs, programmable tensor cores accelerate the heavy operations involved in deep neural networks. This paper presents a novel solution to re-purpose tensor cores in modern GPUs for high-throughput implementation of turbo decoders. Turbo codes closely approach Shannon’s limit on channel capacity, and are widely used in many state-of-the-art wireless systems including satellite communications and mobile communications. Experimental evaluations show that the proposed solution achieves about 1.2 Gbps throughput, which is higher compared to previous GPU-accelerated solutions  

    A Machine Learning Approach to Minimize Power Consumption of Smartphones While Satisfying the Gaming Performance

    , M.Sc. Thesis Sharif University of Technology Aghapour, Ehsan (Author) ; Sarbazi Azad, Hamid (Supervisor)
    Abstract
    Today's smartphone devices include several cores, such as CPU, GPU, and different accelerators, in order to maximize user experience. However, due to meeting the power budget and limited capacity battery, power and energy of their cores should be managed using dynamic power management methods such as dynamic voltage and frequency scaling (DVFS). For this purpose, we should find optimal frequency and voltage settings of processing cores for each time, to minimize energy consumption while retaining user experience. Finding this optimal frequency and voltage settings is a challenging problem that depends on many parameters. We propose to use deep reinforcement learning (DRL) method to... 

    A Study of the Phase Diagram of the Hubbard Model Using Modern Numerical Methods

    , M.Sc. Thesis Sharif University of Technology Manavi, Alireza (Author) ; Vaezi, Mir Abolhassan (Supervisor)
    Abstract
    The Hubbard model is one of the simplest interacting models in theoretical physics, especially condensed matter physics which despite its simplicity, its solutions are highly nontrivial and intractable. After 5 decades since its introduction, its phase diagram is not fully understood and whether or not, it has a high-temperature superconducting phase. In this thesis, we aim to employ the recent advances in machine learning and GPU programming to accelerate the QMC method. By accelerated QMC methods, we can explore the Hubbard model's phase diagram more efficiently. Using massive parallelization of the GPUs can speed up the measuring process by several times. The self-learning quantum... 

    Computational Investigation of Primary Atomization of an Unsteady and 3D Laminar Liquid Jet, Using LBM and GPU

    , M.Sc. Thesis Sharif University of Technology Shadkhah, Mehdi (Author) ; Taeibi, Mohammad (Supervisor) ; Kebriaee, Azadeh (Co-Supervisor) ; Salimi, Mohammad Reza (Co-Supervisor)
    Abstract
    Based on the history of computational fluid dynamics, choosing a proper method for three-dimensional investigation of two-phase flows is always challenging. In this research, the flow of atomization of a liquid jet was investigated. Also, Using GPU technique made our computations about 40 times faster. The numerical results are in good agreement with available numerical and experimental data. Based on our results, jet flow can achieve different regimes at different Weber and Reynold. Jet flow was found in dripping and Rayleigh instability regimes when Weber number was set to 1.79 and 3.10, respectively. Also, the transition between dripping and jetting was estimated at We between 2 to 3. In... 

    Packet Processing Acceleration for Virtualized Network Functions

    , M.Sc. Thesis Sharif University of Technology Barati Sedeh, Amir Reza (Author) ; Jahangir, Amir Hossein (Supervisor)
    Abstract
    Network Function Virtualization (NFV) is a new paradigm in computer networks for implementing network functions on general-purpose common-of-the-shelf processors. The revolutionized NFV architecture has shown to be a suitable replacement for hardware middleboxes. NFV is in high demand because of its lower costs, increased flexibility, and more scalability. However, the ever-growing volume of network traffic and increased data transfer speed in computer networks have made achieving high performance the major issue in this area. With thousands of processing cores and high memory bandwidth, Graphical Processing Units (GPUs) are potentially an appropriate solution to accelerate network... 

    Improving the Utilization Rate of the GPU Memory Resources by Exploiting Application's Heterogeneity

    , Ph.D. Dissertation Sharif University of Technology Darabi Moghaddam, Sina (Author) ; Sarbazi Azad, Hamid (Supervisor)
    Abstract
    In recent years, GPUs have become a popular choice for high-performance general-purpose systems. However, since general-purpose applications do not always utilize processing and memory resources efficiently, energy consumption in these systems has not been managed effectively. As the variety of resources within GPUs continues to increase, the problem of low utilization has become more critical. Past solutions have focused on integrating resources and using multi-tasking, but these methods have limitations in terms of performance and security. Therefore, this research proposes new methods for improving the resource utilization and energy efficiency of GPUs. This dissertation first examines... 

    A Hardware-Software Partitioner for Deep Learning Algorithms

    , M.Sc. Thesis Sharif University of Technology Haghighi, Sepand (Author) ; Hessabi, Shahin (Supervisor)
    Abstract
    Deep learning, as a subdivision of machine learning, attempts to model high-level concepts by using a deep graph, consisting of several layers of linear and nonlinear transformations. Implementing these algorithms on hardware is a big challenge.¬This project offers a system in which various hardware methodologies can be used to implement deep learning algorithms side by side. The overall structure of the system consists of high-level programming interfaces for implementation and expression of machine learning algorithms by the user, which will be available as libraries in a high-level programming language such as Python, Ruby, and Julia. These interfaces allow the user to evaluate their... 

    MHC-Peptide Binding Prediction Using a Deep Learning Method with Efficient GPU Implementation Approach

    , M.Sc. Thesis Sharif University of Technology Darvishi, Saeed (Author) ; Koohi, Somayyeh (Supervisor)
    Abstract
    The Major Histocompatibility Complex (MHC) binds to the derived peptides from pathogens to present them to killer T cells on the cell surface. Developing computational methods for accurate, fast, and explainable peptide-MHC binding prediction can facilitate immunotherapies and vaccine development. Various deep learning-based methods rely on feature extraction from the peptide and MHC sequences separately and ignore their valuable binding information. This paper develops a capsule neural network-based method to efficiently capture and model the peptide-MHC complex features to predict the peptide- MHC class I binding. Various evaluations over multiple datasets using popular performance metrics...