Loading...
Search for: prefetching
0.011 seconds
Total 36 records

    Evaluation of data prefetchers

    , Article Advances in Computers ; Volume 125 , 2022 , Pages 69-89 ; 00652458 (ISSN); 9780323851190 (ISBN) Shakerinava, M ; Golshan, F ; Ansari, A ; Lotfi Kamran, P ; Sarbazi Azad, H ; Sharif University of Technology
    Academic Press Inc  2022
    Abstract
    We introduced several data prefetchers and qualitatively discussed their strengths and weaknesses. Without quantitative evaluation, the true strengths and weaknesses of a data prefetcher are still vague. To shed light on the strengths and weaknesses of the introduced data prefetchers and to enable the readers to better understand these prefetchers, in this chapter, we quantitatively compare and contrast them. © 2022 Elsevier Inc  

    State-of-the-art data prefetchers

    , Article Advances in Computers ; Volume 125 , 2022 , Pages 55-67 ; 00652458 (ISSN); 9780323851190 (ISBN) Shakerinava, M ; Golshan, F ; Ansari, A ; Lotfi Kamran, P ; Sarbazi Azad, H ; Sharif University of Technology
    Academic Press Inc  2022
    Abstract
    We introduced several styles of data prefetching in the past three chapters. The introduced data prefetchers were known for a long time, sometimes for decades. In this chapter, we introduce several state-of-the-art data prefetchers, which have been introduced in the past few years. In particular, we introduce DOMINO, BINGO, MLOP, and RUNAHEAD METADATA. © 2022 Elsevier Inc  

    Harnessing pairwise-correlating data prefetching with runahead metadata

    , Article IEEE Computer Architecture Letters ; Volume 19, Issue 2 , 2020 , Pages 130-133 ; ISSN: 15566056 Golshan, F ; Bakhshalipour, M ; Shakerinava, M ; Ansari, A ; Lotfi Kamran, P ; Sarbazi Azad, H ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2020
    Abstract
    Recent research revisits pairwise-correlating data prefetching due to its extremely low overhead. Pairwise-correlating data prefetching, however, cannot accurately detect where data streams end. As a result, pairwise-correlating data prefetchers either expose low accuracy or they lose timeliness when they are performing multi-degree prefetching. In this letter, we propose a novel technique to detect where data streams end and hence, control the multi-degree prefetching in the context of pairwise-correlated prefetchers. The key idea is to have a separate metadata table that operates one step ahead of the main metadata table. This way, the runahead metadata table harnesses the degree of... 

    Temporal prefetching

    , Article Advances in Computers ; 2021 ; 00652458 (ISSN) Lotfi-Kamran, P ; Sarbazi Azad, H ; Sharif University of Technology
    Academic Press Inc  2021
    Abstract
    Many applications, including big-data server applications, frequently encounter data misses. Consequently, they lose significant performance potential. Fortunately, data accesses of many of these applications follow temporal correlations, which means data accesses repeat over time. Temporal correlations occur because applications usually consist of loops, and hence, the sequence of instructions that constitute the body of a loop repeats many times, leading to data access repetition. Temporal data prefetchers take advantage of temporal correlation to predict and prefetch future memory accesses. In this chapter, we introduce the concept of temporal prefetching and present two instances of... 

    Spatial prefetching

    , Article Advances in Computers ; 2021 ; 00652458 (ISSN) Lotfi Kamran, P ; Sarbazi Azad, H ; Sharif University of Technology
    Academic Press Inc  2021
    Abstract
    Many applications extensively use data objects with a regular and fixed layout, which leads to the recurrence of access patterns over memory regions. Spatial data prefetching techniques exploit this phenomenon to prefetch future memory references and hide their long latency. Spatial prefetchers are particularly of interest because they usually only need a small storage budget. In this chapter, we introduce the concept of spatial prefetching and present two instances of spatial data prefetchers, SMS and VLDP. © 2021 Elsevier Inc  

    Temporal prefetching

    , Article Advances in Computers ; Volume 125 , 2022 , Pages 31-41 ; 00652458 (ISSN); 9780323851190 (ISBN) Lotfi-Kamran, P ; Sarbazi Azad, H ; Sharif University of Technology
    Academic Press Inc  2022
    Abstract
    Many applications, including big-data server applications, frequently encounter data misses. Consequently, they lose significant performance potential. Fortunately, data accesses of many of these applications follow temporal correlations, which means data accesses repeat over time. Temporal correlations occur because applications usually consist of loops, and hence, the sequence of instructions that constitute the body of a loop repeats many times, leading to data access repetition. Temporal data prefetchers take advantage of temporal correlation to predict and prefetch future memory accesses. In this chapter, we introduce the concept of temporal prefetching and present two instances of... 

    Spatial prefetching

    , Article Advances in Computers ; Volume 125 , 2022 , Pages 19-29 ; 00652458 (ISSN); 9780323851190 (ISBN) Lotfi Kamran, P ; Sarbazi Azad, H ; Sharif University of Technology
    Academic Press Inc  2022
    Abstract
    Many applications extensively use data objects with a regular and fixed layout, which leads to the recurrence of access patterns over memory regions. Spatial data prefetching techniques exploit this phenomenon to prefetch future memory references and hide their long latency. Spatial prefetchers are particularly of interest because they usually only need a small storage budget. In this chapter, we introduce the concept of spatial prefetching and present two instances of spatial data prefetchers, SMS and VLDP. © 2022 Elsevier Inc  

    Performance Improvement of Compression Algorithms for Gene Sequencing Reads by Cache Miss Improvement

    , M.Sc. Thesis Sharif University of Technology Shadab, Mohammad (Author) ; Goudarzi, Maziar (Supervisor)
    Abstract
    Nowadays, one of the challenges in the field of bioinformatics is the excess processed data volume such that this data volume resulted from a complete genome sequence of a species can be up to hundreds gigabytes. Every time that we talk about increasing data volume, data storage, transforming, and the process will become of interest. Moreover, considering the presence of portable sequencer devices in the market and the limitations of process outside of the lab environments, this problem becomes of more critical importance. Fortunately, due to the nature of the genome data and their redundancy, specific algorithms to compress them have been introduced to the market. In this thesis, we chose... 

    An Efficient Data Prefetching Scheme in GPUs

    , M.Sc. Thesis Sharif University of Technology Mostofi, Saba (Author) ; Sarbazi Azad, Hamid (Supervisor)
    Abstract
    GPUs exploit memory hierarchy and thread-level parallelism (TLP) to hide off-chip memory access delay. However, GPUs can not keep TLP high during the execution of various applications, and hence, they fall short of hiding the access delay to off-chip memory. One effective approach to reducing memory access delay is prefetching. Prior research shows the positive impact of prefetching in improving the performance of GPUs. However, they fail to capture all the potential behind the prefetching in GPUs. In this thesis, we propose Snake, a new data prefetching in GPUs. Snake identifies stride distances among different memory access instructions and prefetches a chain of addresses that will be... 

    Adaptive prefetching using global history buffer in multicore processors

    , Article Journal of Supercomputing ; Vol. 68, issue. 3 , June , 2014 , p. 1302-1320 ; ISSN: 9208542 Naderan Tahan, M ; Sarbazi Azad, H ; Sharif University of Technology
    Abstract
    Data prefetching is a well-known technique to hide the memory latency in the last-level cache (LCC). Among many prefetching methods in recent years, the Global History Buffer (GHB) proves to be efficient in terms of cost and speedup. In this paper, we show that a fixed value for detecting patterns and prefetch degree makes GHB to (1) be conservative while there are more opportunities to create new addresses and (2) generate wrong addresses in the presence of constant strides. To resolve these problems, we separate the pattern length from the prefetching degree. The result is an aggressive prefetcher that can generate more addresses with a given pattern length. Furthermore with a variable... 

    Why does data prefetching not work for modern workloads?

    , Article Computer Journal ; Volume 59, Issue 2 , 2016 , Pages 244-259 ; 00104620 (ISSN) Naderan Tahan, M ; Sarbazi Azad, H ; Sharif University of Technology
    Oxford University Press  2016
    Abstract
    Emerging cloud workloads in today's modern data centers have large memory footprints that make the processor's caches to be ineffective. Since L1 data cache is in the critical path, high data cache miss rates degrade the performance. To fix the issue in traditional workloads, data prefetchers predict the needed data to hide the memory latency and ultimately improve performance. In this paper, we focus on the L1 data cache to answer the question on why state-of-the-art prefetching methods are inefficient for modern workloads in terms of performance and energy consumption? This is because L1 cache is the most important player affecting the processor performance. Results show that, on the one... 

    Divide and conquer frontend bottleneck

    , Article 47th ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2020, 30 May 2020 through 3 June 2020 ; Volume 2020-May , 2020 , Pages 65-78 Ansari, A ; Lotfi Kamran, P ; Sarbazi Azad, H ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2020
    Abstract
    The frontend stalls caused by instruction and BTB misses are a significant source of performance degradation in server processors. Prefetchers are commonly employed to mitigate frontend bottleneck. However, next-line prefetchers, which are available in server processors, are incapable of eliminating a considerable number of L1 instruction misses. Temporal instruction prefetchers, on the other hand, effectively remove most of the instruction and BTB misses but impose significant area overhead. Recently, an old idea of using BTB-directed instruction prefetching is revived to address the limitations of temporal instruction prefetchers. While this approach leads to prefetchers with low area... 

    Object-aware cache: Higher hit-ratio in object-oriented ASIPs

    , Article Canadian Conference on Electrical and Computer Engineering; Technology Driving Innovation, 2004, Niagara Falls, 2 May 2004 through 5 May 2004 ; Volume 2 , 2004 , Pages 0653-0656 ; 08407789 (ISSN) Goudarzi, M ; Hessabi, S ; Mycroft, A ; Sharif University of Technology
    2004
    Abstract
    At any point in time in an object-oriented (OO) program, a class method is running whose set of unconditionally-accessed data fields can be statically determined. We propose to fetch this set prior to or during the method execution to increase the data cache hit-ratio. This requires that either the software directs the processor cache controller, or the processor is aware of the currently running class method. We follow the latter approach by extending our previous work where we introduced the object-oriented application-specific instruction processor (OO-ASIP) as a processor whose instruction-set consist of methods of a class library. Such an OO-ASIP is aware of the currently running method... 

    Instruction Cache Miss Rate Reduction with Timely Next-Line Prefetching

    , M.Sc. Thesis Sharif University of Technology Ansari, Ali (Author) ; Sarbazi Azad, Hamid (Supervisor)
    Abstract
    The frontend stalls caused by instruction cache and branch target buffer (BTB) misses are a well-known source of performance degradation in server processors.To address this limitation, a myriad of hardware prefetchers are proposed. While they can effectively eliminate lots of misses and increase performance, they are impractical solutions due to some shortcomings. In this study, we are looking for effective and practical solutions to address these limitations.Since the considerable fraction of instruction cache misses is sequential misses,sequential prefetchers like next-line prefetcher are simple and effective solutions to remove sequential misses that are used in modern server processors.... 

    Improving the Accuracy of Data Prefetching via Depth Estimation

    , M.Sc. Thesis Sharif University of Technology Golshan, Fatemeh (Author) ; Sarbazi Azad, Hamid (Supervisor)
    Abstract
    Data Prefetcher is a central component in most processors. Different methods have been proposed with varying degrees of complexity and effectiveness. Recent research revisits pairwise-correlating data prefetching due to its extremely low overhead. Pairwise-correlating data prefetching, however, cannot accurately detect where data streams end. As a result, pairwise-correlating data prefetchers either expose low accuracy or they lose timeliness when they are performing multi degree prefetching. In this work, we propose a novel technique to detect where data streams end and hence, control the multi-degree prefetching in the context of pairwise-correlated prefetchers. The key idea is to have a... 

    Domino temporal data prefetcher

    , Article Proceedings - International Symposium on High-Performance Computer Architecture ; Volume 2018-February , 2018 , Pages 131-142 ; 15300897 (ISSN); 9781538636596 (ISBN) Bakhshalipour, M ; Lotfi Kamran, P ; Sarbazi Azad, H ; Bitmain; DeePhi; et al.; Huawei; IBM; Intel ; Sharif University of Technology
    IEEE Computer Society  2018
    Abstract
    Big-data server applications frequently encounter data misses, and hence, lose significant performance potential. One way to reduce the number of data misses or their effect is data prefetching. As data accesses have high temporal correlations, temporal prefetching techniques are promising for them. While state-of-the-art temporal prefetching techniques are effective at reducing the number of data misses, we observe that there is a significant gap between what they offer and the opportunity. This work aims to improve the effectiveness of temporal prefetching techniques. We identify the lookup mechanism of existing temporal prefetchers responsible for the large gap between what they offer and... 

    Evaluation of hardware data prefetchers on server processors

    , Article ACM Computing Surveys ; Volume 52, Issue 3 , 2019 ; 03600300 (ISSN) Bakhshalipour, M ; Tabaeiaghdaei, S ; Lotfi Kamran, P ; Sarbazi Azad, H ; Sharif University of Technology
    Association for Computing Machinery  2019
    Abstract
    Data prefetching, i.e., the act of predicting an application's future memory accesses and fetching those that are not in the on-chip caches, is a well-known and widely used approach to hide the long latency of memory accesses. The fruitfulness of data prefetching is evident to both industry and academy: Nowadays, almost every high-performance processor incorporates a few data prefetchers for capturing various access patterns of applications; besides, there is a myriad of proposals for data prefetching in the research literature, where each proposal enhances the efficiency of prefetching in a specific way. In this survey, we evaluate the effectiveness of data prefetching in the context of... 

    MANA: Microarchitecting a temporal instruction prefetcher

    , Article IEEE Transactions on Computers ; 2022 , Pages 1-1 ; 00189340 (ISSN) Ansari, A ; Golshan, F ; Barati, R ; Lotfi Kamran, P ; Sarbazi Azad, H ; Sharif University of Technology
    IEEE Computer Society  2022
    Abstract
    L1 instruction(L1-l) cache misses are a source of performance bottleneck. While many instruction prefetchers have been proposed, most of them leave a considerable potential uncovered. In 2011, Proactive Instruction Fetch (PIF) showed that a hardware prefetcher could effectively eliminate all instruction-cache misses. However, its enormous storage cost makes it impractical. Consequently, reducing the storage cost was the main research focus in instruction prefetching in the past decade. Several instruction prefetchers, including RDIP and Shotgun, were proposed to offer PIF-level performance with significantly lower storage overhead. However, our findings show that there is a considerable... 

    Performance Enhancement of Enterprise Storage Systems Using a Markov-Based Prefetching Method

    , M.Sc. Thesis Sharif University of Technology Sereshki, Sina (Author) ; Asadi, Hossein (Supervisor)
    Abstract
    With increasing rate of digital information in the world, design, configuration, and networking of enterprise storage systems has become an essential part in designing data centers. The performance of data storage systems in serving incoming requests is one of the major parameters of such systems. A major metric to measure performance is response time. This parameter is, in particular, crucial in enterprise applications such as financial, credit, multimedia, and real-time applications. A common approach to enhance the performance of enterprise storage systems is improving the hit ratio of the system global memory using prefetching technique. Using prefetching technique, a data block is... 

    Designing Instruction Prefetcher with Low Area Overhead for Server Workloads

    , M.Sc. Thesis Sharif University of Technology Faghih, Faezeh (Author) ; Sarbazi Azad, Hamid (Supervisor) ; Lotfi Kamran, Pejman (Co-Supervisor)
    Abstract
    L1 instruction cache misses creates a crucial performance bottleneck for server applications. Server applications extensively use operating system services, and as such, have large instruction footprint that dwarfs instruction cache size. Meanwhile, fast access requirements preclude enlarging instruction cache that can hold the whole instruction footprint of current server workloads. Prior works proposed using hardware prefetching schemes to eliminate or reduce the effect of instruction cache misses. They use the fact that server application instruction sequences are repetitive. So by recording and prefetching based on such sequesnces, L1 insruction misses could be reduced. While they...