Sharif Digital Repository / Sharif University of Technology / Search result

Please enable javascript in your browser.

Search for: jasemi--m

0.005 seconds

Total 2 records

NoC design methodologies for heterogeneous architecture

, Article Proceedings - 2020 28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2020, 11 March 2020 through 13 March 2020 ; 2020 , Pages 299-306 Alhubail, L ; Jasemi, M ; Bagherzadeh, N ; Sharif University of Technology

Institute of Electrical and Electronics Engineers Inc 2020

Abstract

Fused CPU-GPU architectures that utilize the powerful features of both processors are common nowadays. Using homogeneous interconnect for such heterogeneous processors can result in performance degradation and power increase. This paper explores the optimization of heterogeneous NoC design to connect heterogeneous CPU-GPU architecture in terms of NoC performance and power. This involves solving four different NoC design sub-problems simultaneously; processing elements (PEs) mapping, buffer size and virtual channel assignments, and links' bandwidth determination. Heuristic-based optimization methods were proposed to obtain a near-optimal heterogeneous NoC design, and formal models were used...

Partition pruning: Parallelization-aware pruning for dense neural networks

, Article 28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2020, 11 March 2020 through 13 March 2020 ; 2020 , Pages 307-311 Shahhosseini, S ; Albaqsami, A ; Jasemi, M ; Bagherzadeh, N ; Sharif University of Technology

Institute of Electrical and Electronics Engineers Inc 2020

Abstract

As recent neural networks are being improved to be more accurate, their model's size is exponentially growing. Thus, a huge number of parameters requires to be loaded and stored from/in memory hierarchy and computed in processors to perform training or inference phase of neural network processing. Increasing the number of parameters causes a big challenge for real-time deployment since the memory bandwidth improvement's trend cannot keep up with models' complexity growing trend. Although some operations in neural networks processing are computational intensive such as convolutional layer computing, computing dense layers face with memory bandwidth bottleneck. To address the issue, the paper...

Total 2 records