Sharif Digital Repository / Sharif University of Technology / Search result

Evaluating Data Prefetching Methods and Proposing an Energy-aware First Level Cache for Cloud Workloads

, Ph.D. Dissertation Sharif University of Technology Naderan Tahan, Mahmood (Author) ; Sarbazi Azad, Hamid (Supervisor)

Abstract

Data generation rate is far more than the technology scaling rate in a way that there will be a 40x gap between the data generation rate and the technology scaling rate in 2020. On one hand, unlike traditional HPC clusters, processors in data centers are not fully utilized and on the other hand, unlike traditional embedded processors, they are not idle most of the time. Therefore, energy consumption of such processors is an important issue; otherwise dealing with a huge volume of data will be problematic in the near future. In this dissertation, we will show that while first level data cache encounters high miss rate, traditional approaches such as data prefetching, which were efficient for...

Improving GPGPU Performance Through Efficient use of Memory Controllers

, M.Sc. Thesis Sharif University of Technology Bakhishi, Mohammad Hazhir (Author) ; Sarbazi Azad, Hamid (Supervisor)

Abstract

Appearance of CUDA architecture results in GPU introduction as a suitable platform for parallel processing. Massive usage of GPGPUs in various applications forces the manufacturers to produce this processor with different configurations based on their application demands. But always the design approach observes a ratio between processing capability and bandwidth of GPGPUs. In this way mostly in all GPGPU series the bandwidth increases with growth of GPGPU processing power. At first glance this ratio seems reasonable because more process needs more data. However this approach does not pay attention to the behavior of a wide range of workloads which do not need such a bandwidth. Mentioned...

Improving CPU-GPU System Performance Through Dynamic Management of LLC and NoC

, M.Sc. Thesis Sharif University of Technology Rostamnejad Khatir, Maede (Author) ; Sarbazi Azad, Hamid (Supervisor)

Abstract

CPU-GPU Heterogeneous System Architectures (HSA) play an important role in today's computing systems. Because of fast-growing in technology and the necessity of high-performance computing, HSAs are widely used platforms. Integrating the multi-core Central Processing Unit (CPU) with many-core Graphics Processing Unit (GPU) on the same die combines the feature of both processors and providing better performance. The capacity of HSAs to provide high throughput of computing led to the widespread use of these systems. Besides the high performance of HSAs, we also face challenges. These challenges are caused by the use of two processors with different behaviors and requirements on the same die....

A Study of Management Policies of Shared Resources in Modern SSDs with Multi-Programming

, M.Sc. Thesis Sharif University of Technology Mohammad Hassani, Arghavan (Author) ; Sarbazi Azad, Hamid (Supervisor)

Abstract

Solid State Drives (SSDs) have an electronic architecture based on non-volatile memories. The unique merit of a SSD is its internal parallelism. Thus, adopting SSDs as the storage media in the storage systems in various devices, such as smart phones, personal computers, large workstations, and etc., improves the performance of the device, substantially. However, the conventional host interface protocols in SSDs, i.e. SATA and SAS, have critical limitations, like low bus bandwidth (maximum 12Gbps), and a single shared request queue. With such limitations, the host interface turns into a performance bottleneck in conventional SSDs. Therefore, the high performance provided by such SSDs cannot...

Training Compressed DNNs for Resisting Against Adversarial Attacks

, M.Sc. Thesis Sharif University of Technology Mohseni Sangtabi, Saman (Author) ; Sarbazi Azad, Hamid (Supervisor)

Abstract

Deep Neural Network (DNN) compression is a highly effective technique for reducing the computational burden and energy consumption associated with neural network inference, which is particularly important for low-power, embedded, and real-time systems. Weight pruning and quantization are among the most effective methods for neural network compression. Nonetheless, DNN compression poses various challenges, such as preserving network accuracy, particularly when dealing with adversarial attacks. Network compression can also lead to irregularities in the network structure and imbalanced distribution of workloads, which in turn can result in reduced utilization from the potential compression...

Improving the Efficiency of GPUs by Reducing Register File Accesses

, M.Sc. Thesis Sharif University of Technology Mohammadpur Fard, Ali (Author) ; Sarbazi Azad, Hamid (Supervisor)

Abstract

Graphiⅽs Proⅽessing Units (GPUs) use a ⅼarge register fiⅼe to support a ⅼarge nuⅿber of paraⅼⅼeⅼ threaⅾs, whiⅽh is responsibⅼe for a ⅼarge fraⅽtion of the ⅾeviⅽe’s totaⅼ power ⅽonsuⅿption, anⅾ ⅾie area. Ⅾue to the ⅽonventionaⅼ RISⅭ−ⅼike instruⅽtion set arⅽhiteⅽture, a reasonabⅼy ⅼarge fraⅽtion of aⅼⅼ aⅽⅽesses to the register fiⅼe are perforⅿeⅾ to aⅽⅽoⅿⅿoⅾate the ⅿeⅿory aⅽⅽesses perforⅿeⅾ by the threaⅾs, whiⅽh ⅼiⅿits the avaiⅼabⅼe register fiⅼe banⅾwiⅾth for other ⅽonⅽurrent aⅽⅽesses, anⅾ aⅼso keeps at ⅼeast one register per threaⅾ busy for storing ⅼoaⅾeⅾ vaⅼues. In this thesis, we propose ⅿoving away froⅿ the ⅽonventionaⅼ RISⅭⅼike arⅽhiteⅽture anⅾ aⅼⅼowing ⅿeⅿory operanⅾs for soⅿe...

Improving the Utilization Rate of the GPU Memory Resources by Exploiting Application's Heterogeneity

, Ph.D. Dissertation Sharif University of Technology Darabi Moghaddam, Sina (Author) ; Sarbazi Azad, Hamid (Supervisor)

Abstract

In recent years, GPUs have become a popular choice for high-performance general-purpose systems. However, since general-purpose applications do not always utilize processing and memory resources efficiently, energy consumption in these systems has not been managed effectively. As the variety of resources within GPUs continues to increase, the problem of low utilization has become more critical. Past solutions have focused on integrating resources and using multi-tasking, but these methods have limitations in terms of performance and security. Therefore, this research proposes new methods for improving the resource utilization and energy efficiency of GPUs. This dissertation first examines...

Processor Allocation for Future Multi-Core Chip-Multiprocessor

, M.Sc. Thesis Sharif University of Technology Agha Ali Akbari, Fatemeh (Author) ; Sarbazi Azad, Hamid (Supervisor)

Abstract

Since 2005, processor designers have increased core counts to exploit Moore’s Law scaling, rather than focusing on single-core performance. For decades, this approach provides desired performance for parallel and multithreaded workloads. On the other hand, rising of utilization wall limits the number of transistors that can be powered on in chip and result in a large region to be dark. So, same as before trend for performance scaling in future multi processor, an appropriate architecture is essential. There are some structures for this era which used specialization approach to cope with the limited power budget. Therefore, in this thesis, we propose a general-purpose platform that provides...

Fuzzy-Based Routing in Irregular Mesh Noc

, M.Sc. Thesis Sharif University of Technology Rezaei Mayahi Nejad, Mehdi (Author) ; Sarbazi Azad, Hamid (Supervisor)

Abstract

In past decades, we have seen the rise of integration density in chips making it possible to design a whole system on a single chip. The previously designed interconnection architectures for multiprocessors systems cannot directly be applied in on-chip systems (especially when the number of processor elements increases) since they require a different type of a cost-performance trade-off. This is why the interconnection networks of systems-on-chip (SoC) are such a problem. Network-on-chip (NoC) was being proposed as a scalable and reusable communication platform for SoCs, which makes use of the network model to develop efficient on-chip communication infrastructures. The NoC has a layered and...

Performance Comparison of Processor Allocation Algorithms

, M.Sc. Thesis Sharif University of Technology Taghdimi Abbas Pour, Majid (Author) ; Sarbazi Azad, Hamid (Supervisor)

Abstract

Efficient processor allocation and job scheduling algorithms are critical if the full computational power of large-scale multicomputers is to be harnessed effectively. Processor allocation is responsible for selecting the set of processors on which parallel jobs are executed, whereas job scheduling is responsible for determining the order in which the jobs are executed. Many processor allocation strategies have been devised for mesh-connected multicomputers and these can be divided into two main categories: contiguous and non-contiguous. In contiguous allocation, jobs are allocated distinct contiguous processor submeshes for the duration of their execution. Such a strategy could lead to high...

Data Tiering in Redundant Array of Independent Disks

, M.Sc. Thesis Sharif University of Technology Tarihi, Mojtaba (Author) ; Asadi, Hossein (Supervisor) ; Sarbazi Azad, Hamid (Co-Advisor)

Abstract

With the technological advances in silicon technology, the price of flash based storage devices has significantly fallen and they are now a main choice for mass data storage. Solid state disks are however, still much more expensive per-capacity than hard disks and tiering can be utilized to use them cost-effectively. Tiering attempts to make use of the diversity offered by storage devices and backend configurations to support the diverse needs of I/O workloads. Tiering is generally done on top of static storage hierarchies and as such, moving a data block into a certain tier will dictate its configuration as well. In this research, an architecture is proposed than can independently encode...

Analyzing the Effect of Interconnection Topology on the Performance of Enterprise SSDs

, M.Sc. Thesis Sharif University of Technology Soltani, Behnaz (Author) ; Sarbazi-Azad, Hamid (Supervisor) ; Hesabi, Shahin (Co-Advisor)

Abstract

In recent years, flash-based Solid State Drives (SSDs), because of their lower power consumption, higher throughput, and resistance against physical damages compared to Hard Disk Drives (HDDs) have confronted ever increasing usage in data center, cloud applications, and enterprise servers. Recently, advantages of using interconnection networks between SSD controller and NAND Flash chips for transferring data and commands have been shown. Interconnection network intrinsically provides many advantages such as maintaining the signal integrity at high frequencies, pipelining support, sharing the communication resources, eliminating communication bottlenecks, and reducing power consumption. The...

Temperature-aware Power Dissipation Analysis in Hyperscale Data Centers

, Ph.D. Dissertation Sharif University of Technology Rezaei Mayahi Nejad, Mehdi (Author) ; Sarbazi-Azad, Hamid (Supervisor)

Abstract

Hyperscale cloud data center (HCDC) (a.k.a., hyperscale data center) is the backbone of a wide variety of Internet services such as web-hosting, ecommerce, social networking, software as a service (SaaS), platform as a service (PaaS), application as a service (AaaS), and cloud computing.HCDC platforms consist of massive parallel processing, high utilization rate, and high volume storage that cost hundreds of million dollars.The increased demand for various Internet services and the subsequent dramatic growth in HCDC platforms, have caused an exponential increase in utilization of energy. With energy costs on the rise and global attention focused on carbon footprints, organizations are...

Improving Data Storage Solutions Using Workload Characteristics

, M.Sc. Thesis Sharif University of Technology Tarihi, Mojtaba (Author) ; Asadi, Hossein (Supervisor) ; Sarbazi-Azad, Hamid (Supervisor)

Abstract

Responding to the increasing volume and complexity of storage workloads requires continuous design and improvement of storage subsystems. Storage workload behavior such as spatial and temporal locality, request type, and frequency have considerable impact on performance. Hence, performance evaluation and prediction must be performed with respect to workload properties. Moreover, design and implementation of solutions that adapt to workload behavior may further increase the performance and endurance of storage subsystems.One of the key aspects of this thesis is to speed-up the performance evaluation of storage hardware. Three main approaches exist for performance evaluation: simulation,...

Enhancing Branch Target Buffer Efficiency with a Bias-Aware (Re)placement Policy

, M.Sc. Thesis Sharif University of Technology Ebrahimi, Mahdi (Author) ; Sarbazi Azad, Hamid (Supervisor) ; Hessabi, Shaahin (Supervisor)

Abstract

Branch Target Buffer is a widely used component in modern processors. While there are different designs for BTB, they generally have a set-associative structure keeping branches and their target to help the frontend fetch the instructions on the correct path. To achieve high performance, it’s essential to obtain a high hit rate out of the BTB. Prior works has shown that BTB suffers from frequent misses that require large sizes or sophisticated BTB prefilling mechanisms to overcome the problem. However, the first solution imposes a significant storage overhead, and the latter results in limited benefits. Prior works have shown that branches exhibit different behaviors from being strongly...

Structure and Dynamics of Directed Small World Networks

, M.Sc. Thesis Sharif University of Technology Sheshbolouki, Aida (Author) ; Sarbazi Azad, Hamid (Supervisor) ; Zarei, Mina (Co-Advisor)

Abstract

It has been shown that there is a relationship between the topological and spectral properties of a graph and its dynamics. However, most of the studies have not considered the direction of the links, even though most of the real networks are directed. In this thesis, we study the intrinsic relationship between the graphs’ structures, the spectral characteristics of their associated matrices, and the synchronization implications for different directed random graphs by changing their link directionality. To this end we make use of Kuramoto model for modeling the synchronization dynamics of the network. We show that, in contrast to the results of studies which suggest the λ2/λN as a measure of...

Unifying L1 Data Cache and Shared Memory in GPUs

, M.Sc. Thesis Sharif University of Technology Yousefzadeh-Asl-Miandoab, Ehsan (Author) ; Sarbazi Azad, Hamid (Supervisor)

Abstract

Graphics Processing Units (GPUs) employ a scratch-pad memory (a.k.a., shared memory) in each streaming multiprocessor to accelerate data sharing among the threads in a thread block and provide a software-managed cache for the programmers.However, we observe that about 60% of GPU workloads of several well-known benchmark suites do not use shared memory. Morever, among those workloads that use shared memory, about 42% of shared memory is not utilized, on average. On the other hand, we observe that many general purpose GPU applications suffer from the low hit rate and limited bandwidth of L1 data cache.We aim to use shared memory space and its corrsponding bandwidth for improving L1 data cache,...

Improving the Efficiency of On-chip 3D Stacked DRAM in Server Processors

, M.Sc. Thesis Sharif University of Technology Samandi, Farid (Author) ; Sarbazi-Azad, Hamid (Supervisor) ; Lotfi Kamran, Pejman (Co-Advisor)

Abstract

Big-data server workloads have vast datasets, and hence, frequently access off-chip memory for data. Consequently, server workloads lose significant performance potential due to off-chip latency and bandwidth walls. Recent research advocates using 3D stacked DRAM to break the walls. As 3D stacked DRAM cannot accommodate the whole datasets of server workloads, most proposals use 3D DRAM as a large cache. Unfortunately, a large DRAM cache imposes latency overhead due to (1) the need for tag lookup and (2) inefficient utilization of on-chip and off-chip bandwidth, and as a result, lowers the benefits of 3D stacked DRAM. Moreover, storing the tags of a multi-gigabyte DRAM cache requires changes...

Designing Instruction Prefetcher with Low Area Overhead for Server Workloads

, M.Sc. Thesis Sharif University of Technology Faghih, Faezeh (Author) ; Sarbazi Azad, Hamid (Supervisor) ; Lotfi Kamran, Pejman (Co-Supervisor)

Abstract

L1 instruction cache misses creates a crucial performance bottleneck for server applications. Server applications extensively use operating system services, and as such, have large instruction footprint that dwarfs instruction cache size. Meanwhile, fast access requirements preclude enlarging instruction cache that can hold the whole instruction footprint of current server workloads. Prior works proposed using hardware prefetching schemes to eliminate or reduce the effect of instruction cache misses. They use the fact that server application instruction sequences are repetitive. So by recording and prefetching based on such sequesnces, L1 insruction misses could be reduced. While they...

A High Endurance I/O Cache Architecture for All-Flash Storage Systems

, M.Sc. Thesis Sharif University of Technology Shafiei Marji, Abolfazl (Author) ; Asadi, Hossein (Supervisor) ; Sarbazi Azad, Hamid (Co-Supervisor)

Abstract

The increasing growth of digital data and the use of data-intensive applications and cloud computing have made data storage systems the performance bottlenecks of computing systems. To improve the efficiency of data storage systems and enhance the quality of service, storage system manufacturers use high-performance storage devices such as solid-state drives (SSDs) in their designs, namely, all-flash storage systems. SSDs provide significant performance compared to hard disk drives and are used in the architecture of high-speed storage systems. Despite very high speed of SSDs, their high price and low endurance have limited their usage in data storage systems. The limited endurance of SSDs...