Loading...
Search for: hardware-accelerator
0.009 seconds

    Time-scalable mapping for circuit-switched GALS chip multiprocessor platforms

    , Article IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems ; Vol. 33, issue. 5 , May , 2014 , p. 752-762 Foroozannejad, M. H ; Hashemi, M ; Mahini, A ; Baas, B. M ; Ghiasi, S ; Sharif University of Technology
    Abstract
    We study the problem of mapping concurrent tasks of an application to cores of a chip multiprocessor that utilize circuit-switched interconnect and global asynchronous local synchronous (GALS) clocking domains. We develop a configurable algorithm that naturally handles a number of practical requirements, such as architectural features of the target platform, core failures, and hardware accelerators, and in addition, is scalable to a large number of tasks and cores. Experiments with several real life applications show that our algorithm outperforms manual mapping, integer linear programming-based mapping after ten days of solver run time, and a recent packet-switched network on chip-based... 

    Dynamic FPGA-accelerator sharing among concurrently running virtual machines

    , Article Proceedings of 2016 IEEE East-West Design and Test Symposium, EWDTS 2016, 14 October 2016 through 17 October 2016 ; 2017 ; 9781509006939 (ISBN) Nasiri, H ; Goudarzi, M ; Sharif University of Technology
    Abstract
    Using an FPGA as a hardware accelerator has been prevalent, to speed up compute intensive workloads. However, employing an accelerator in virtualized environment enhances complexity, because accessing the accelerator from virtual machines has significant overhead and sharing it needs some considerations. We have implemented adequate infrastructure to share an FPGA-based accelerator between multiple virtual machines with negligible access overhead which dynamically implements virtual machines' accelerators. In our architecture each user process from a virtual machine can directly access the FPGA over PCIe link and reconfigure its accelerator in the specified part of FPGA at run-time. The... 

    Coordinated DVFS and Precision Control for Deep Neural Networks

    , Article IEEE Computer Architecture Letters ; Volume 18, Issue 2 , 2019 , Pages 136-140 ; 15566056 (ISSN) Nabavinejad, S. M ; Hafez Kolahi, H ; Reda, S ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2019
    Abstract
    Traditionally, DVFS has been the main mechanism to trade-off performance and power. We observe that Deep Neural Network (DNN) applications offer the possibility to trade-off performance, power, and accuracy using both DVFS and numerical precision levels. Our proposed approach, Power-Inference accuracy Trading (PIT), monitors the server's load, and accordingly adjusts the precision of the DNN model and the DVFS setting of GPU to trade-off the accuracy and power consumption with response time. At high loads and tight request arrivals, PIT leverages INT8-precision instructions of GPU to dynamically change the precision of deployed DNN models and boosts GPU frequency to execute the requests... 

    Design and Implementation of Hardware Accelerator for Domain Name Service

    , M.Sc. Thesis Sharif University of Technology Jahandar, Ebrahim (Author) ; Jahangir, Amir Hossein (Supervisor)
    Abstract
    In this project we have designed & implemented a hardware accelerator for domain name service. This hardware accelerator is compatible with existing designs and it could be used standalone as an authoritative DNS server or a hardware accelerator in series of an existing DNS server facility. Two goals, are achieved in this thesis: increasing total DNS throughput and decreasing its response time.In this project, we have surveyed about domain name service, its scientific measurements, theory of caching and its effectiveness, name lookup methods and finally some of the similar designs. Domain name lookup in memory is one of the most challenging operations in every DNS server. We have researched... 

    FPGA-Based Implementation of Deep Learning Accelerator with Concentration on Intrusion Detection Systems

    , M.Sc. Thesis Sharif University of Technology Fard, Ebrahim (Author) ; Jahangir, Amir Hossein (Supervisor)
    Abstract
    Intrusion Detection System (IDS) is an equipment destined to provide computer networks security. In recent years, Machine Learning and Deep Neural Network (DNN) methods have been considered as a way to detect new network attacks. Due to the huge amounts of calculations needed for these methods, there is a need for high performance and parallel or specific processors, such as Application Specific Integrated Circuit (ASIC), Graphical Processor Unit (GPU) and Field-Programmable Gate Array (FPGA). The latter seems more suitable than others due to its higher configurability and lesser power consumption. The goal of this study is the acceleration of a DNN-based IDS on FPGA. In this study, which is... 

    Hardware Acceleration of Deep Learning based Firewalls Using FPGA

    , M.Sc. Thesis Sharif University of Technology Fotovat, Amin (Author) ; Jahangir, Amir Hossein (Supervisor)
    Abstract
    In recent years, due to the drawback of rule-based firewalls in detecting unknown attacks, using neural networks has got more attention to be used in firewalls. As the computation load of neural networks are so much there is a need to decrease the processing time and power consumption as they are under load 24/7. Although there have been huge achievements in the usage of graphics processing units (which contain numerous processing cores) in neural networks, their high power consumption has made the scientists think about an alternative to implement neural networks. Field Programmable Gate Array (FPGA) is one of the most serious candidates to be used for implementing neural networks. The goal... 

    Fast architecture for decimal digit multiplication

    , Article Microprocessors and Microsystems ; Volume 39, Issue 4-5 , June–July , 2015 , Pages 296-301 ; 01419331 (ISSN) Fazlali, M ; Valikhani, H ; Timarchi, S ; Tabatabaee Malazi, T ; Sharif University of Technology
    Elsevier  2015
    Abstract
    Abstract BCD digit multiplication module (BDM) is widely used in BCD arithmetic, especially in Decimal Floating-Point (DFP) units. In this paper, we present a new BCD digit multiplication scheme to accelerate this module. Similar to previous articles, our multiplier includes two parts contained binary multiplier and binary to BCD converter. Our contribution towards these modules can successfully overcome the previous BCD digit multipliers. The results indicate 19% hardware acceleration for the proposed multiplier architecture which is comparable to the best previous techniques in UMC 65 nm CMOS standard cells library hardware implementation. Therefore, the proposed BCD digit multiplier is an... 

    Design and optimization of reliable hardware accelerators: leveraging the advantages of high-level synthesis

    , Article 2018 IEEE 24th International Symposium on On-Line Testing and Robust System Design, IOLTS 2018, 2 July 2018 through 4 July 2018 ; 2018 , Pages 232-235 ; 9781538659922 (ISBN) Naz Taher, F ; Kishani, M ; Carrion Schafer, B ; Sharif University of Technology
    Abstract
    This work proposes an automatic method to generate opti- mized redundant hardware accelerator with maximum reliabil- ity given a single behavioral description for High-Level Syn- thesis (HLS). For this purpose, this work exploits one of the main advantages of C-based VLSI design over traditional RT- level design: The ability to generate micro-architectures with unique characteristics from the same behavioral description. This is typically done by setting different synthesis options to determine how to synthesize loops, arrays and functions and to specify the number and type of Functional Units (FUs) to be instantiated. The proposed method is composed of two main phases. The first phase... 

    Partition pruning: Parallelization-aware pruning for dense neural networks

    , Article 28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2020, 11 March 2020 through 13 March 2020 ; 2020 , Pages 307-311 Shahhosseini, S ; Albaqsami, A ; Jasemi, M ; Bagherzadeh, N ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2020
    Abstract
    As recent neural networks are being improved to be more accurate, their model's size is exponentially growing. Thus, a huge number of parameters requires to be loaded and stored from/in memory hierarchy and computed in processors to perform training or inference phase of neural network processing. Increasing the number of parameters causes a big challenge for real-time deployment since the memory bandwidth improvement's trend cannot keep up with models' complexity growing trend. Although some operations in neural networks processing are computational intensive such as convolutional layer computing, computing dense layers face with memory bandwidth bottleneck. To address the issue, the paper... 

    Hardware Acceleration of Convolutional Neural Networks by Computational Prediction

    , M.Sc. Thesis Sharif University of Technology Sajjadi, Pegahsadat (Author) ; Bayatsarmadi, Siavash (Supervisor)
    Abstract
    Recently, Convolutional neural networks (CNNs) are widely used in many artificial intelligence applications such as image processing, speech processing and robotics. The neural networks superior accuracy comes at the cost of high computational complexity. Recent studies show that these operations can be performed in parallel. Therefore, as graphic processing units (GPUs) offer the best performance in terms of computational power and throughput, they are widely used to implement and accelerate neural networks. Nevertheless, the high price and power consumption of these processors have resulted in drawing more attraction towards Field-Programmable Arrays (FPGAs). In order to improve resource... 

    Mitigating the performance and quality of parallelized compressive sensing reconstruction using image stitching

    , Article 29th Great Lakes Symposium on VLSI, GLSVLSI 2019, 9 May 2019 through 11 May 2019 ; 2019 , Pages 219-224 ; 9781450362528 (ISBN) Namazi, M ; Mohammadi Makrani, H ; Tian, Z ; Rafatirad, S ; Akbari, M. H ; Sasan, A ; Homayoun, H ; ACM Special Interest Group on Design Automation (SIGDA) ; Sharif University of Technology
    Association for Computing Machinery  2019
    Abstract
    Orthogonal Matching Pursuit is an iterative greedy algorithm used to find a sparse approximation for high-dimensional signals. The algorithm is most popularly used in Compressive Sensing, which allows for the reconstruction of sparse signals at rates lower than the Shannon-Nyquist frequency, which has traditionally been used in a number of applications such as MRI and computer vision and is increasingly finding its way into Big Data and data center analytics. OMP traditionally suffers from being computationally intensive and time-consuming, this is particularly a problem in the area of Big Data where the demand for computational resources continues to grow. In this paper, the data-level...