Loading...
Search for: global-memory-access
0.005 seconds

    ISP: Using idle SMs in hardware-based prefetching

    , Article Proceedings - 17th CSI International Symposium on Computer Architecture and Digital Systems, CADS 2013 ; October , 2013 , Pages 3-8 ; 9781479905621 (ISBN) Falahati, H ; Abdi, M ; Baniasadi, A ; Hessabi, S ; Computer Society of Iran; IPM ; Sharif University of Technology
    IEEE Computer Society  2013
    Abstract
    The Graphics Processing Unit (GPU) is the most promising candidate platform for faster rate of improvement in peak processing speed, low latency and high performance. The highly programmable and multithreaded nature of GPUs makes them a remarkable candidate for general purpose computing. However, supporting non-graphics computing on graphics processors requires addressing several architecture challenges. In this paper, we focus on improving performance by better hiding long waiting time to transfer data from the slow global memory. Thereupon study an effective light-overhead prefetching mechanism, which utilizes idle processing elements. Our results show that we can potentially improve... 

    Power-efficient prefetching on GPGPUs

    , Article Journal of Supercomputing ; Volume 71, Issue 8 , August , 2015 , pp. 2808-2829 ; ISSN: 09208542 Falahati, H ; Hessabi, S ; Abdi, M ; Baniasadi, A ; Sharif University of Technology
    Abstract
    The graphics processing unit (GPU) is the most promising candidate platform for achieving faster improvements in peak processing speed, low latency and high performance. The highly programmable and multithreaded nature of GPUs makes them a remarkable candidate for general purpose computing. However, supporting non-graphics computing on graphics processors requires addressing several architectural challenges. In this paper, we focus on improving performance by better hiding long waiting time for transferring data from the slow global memory. Furthermore, we show that the proposed method can reduce power and energy. Reduction in access time to off-chip data has a noticeable role in reducing...