Loading...
Search for:
modarressi--m
0.135 seconds
Total 20443 records
Reconfigurable cluster-based networks-on-chip for application-specific MPSoCs
, Article 2012 IEEE 23rd International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2012, Delft, 9 July 2012 through 11 July 2012 ; 2012 , Pages 153-156 ; 10636862 (ISSN) ; 9780769547688 (ISBN) ; Sarbazi Azad, H
2012
Abstract
In this paper, we propose a reconfigurable NoC in which a customized topology for a given application can be implemented. In this NoC, the nodes are grouped into some clusters interconnected by a reconfigurable communication infrastructure. The nodes inside a cluster are connected by a fixed topology. From the traffic management perspective, this structure benefits from the interesting characteristics of the mesh topology (efficient handling of local traffic where each node communicates with its neighbors), while avoids its drawbacks (the lack of short paths between remotely located nodes). We then present a design flow that maps the frequently communicating tasks of a given application into...
A reconfigurable cache architecture for object-oriented embedded systems
, Article 2006 Canadian Conference on Electrical and Computer Engineering, CCECE'06, Ottawa, ON, 7 May 2006 through 10 May 2006 ; 2006 , Pages 959-962 ; 08407789 (ISSN); 1424400384 (ISBN); 9781424400386 (ISBN) ; Hessabi, S ; Goudarzi, M ; Sharif University of Technology
Institute of Electrical and Electronics Engineers Inc
2006
Abstract
A reconfigurable cache architecture for object-oriented application-specific instruction set processors (ASIP) is presented in this paper. The embedded ASIPs we follow in this research are specifically designed to suit object-oriented applications and are synthesized form an object-oriented highlevel specification. The ASIPs are composed of a processor core along with a number of hardware functional units. In order to support concurrent execution of the functional units, we propose a cache architecture which is virtually divided into a number of partitions. The partition sizes can be dynamically changed depending on the run-time behavior of the application. Partitioning the cache not only...
A data prefetching mechanism for object-oriented embedded systems using run-time profiling
, Article Third IEEE International Workshop on Electronic Design, Test and Applications, DELTA 2006, Kuala Lumpur, 17 January 2006 through 19 January 2006 ; Volume 2006 , 2006 , Pages 249-254 ; 0769525008 (ISBN); 9780769525006 (ISBN) ; Hessabi, S ; Gudarzi, M ; Sharif University of Technology
2006
Abstract
A table-based implementation of an application specific data prefetching approach is presented in this paper. This approach is proposed to improve the performance of the application specific instruction-set processors (ASIP) we develop customized to an object-oriented application. In this approach, the cache controller prefetches all data fields of an object required by a class method, when the class method is invoked. In the proposed table-based implementation, the cache controller monitors the class method calls and records the index of object data members that each method accessed. This information is used to prefetch the data items needed by a class method on next invocations of that...
Application-specific hardware-driven prefetching to improve data cache performance
, Article 10th Asia-Pacific Conference on Advances in Computer Systems Architecture, ACSAC 2005, Singapore, 24 October 2005 through 26 October 2005 ; Volume 3740 LNCS , 2005 , Pages 761-774 ; 03029743 (ISSN); 3540296433 (ISBN); 9783540296430 (ISBN) ; Goudarzi, M ; Hessabi, S ; Sharif University of Technology
2005
Abstract
Data cache hit ratio has a major impact on execution performance of programs by effectively reducing average data access time. Prefetching mechanisms improve this ratio by fetching data items that shall soon be required by the running program. Software-driven prefetching enables application-specific policies and potentially provides better results in return for some instruction overhead, whereas hardware-driven prefetching gives little overhead, however general-purpose processors cannot adapt to the specific needs of the running application. In the application-specific processors that we develop customized to an object-oriented application, we implement application-specific hardware...
Using task migration to improve non-contiguous processor allocation in NoC-based CMPs
, Article Journal of Systems Architecture ; Vol. 59, issue. 7 , 2013 , pp. 468-481 ; ISSN: 13837621 ; Asadinia, M ; Sarbazi-Azad, H ; Sharif University of Technology
2013
Abstract
In this paper, a processor allocation mechanism for NoC-based chip multiprocessors is presented. Processor allocation is a well-known problem in parallel computer systems and aims to allocate the processing nodes of a multiprocessor to different tasks of an input application at run time. The proposed mechanism targets optimizing the on-chip communication power/latency and relies on two procedures: processor allocation and task migration. Allocation is done by a fast heuristic algorithm to allocate the free processors to the tasks of an incoming application when a new application begins execution. The task-migration algorithm is activated when some application completes execution and frees up...
Using task migration to improve non-contiguous processor allocation in NoC-based CMPs
, Article Journal of Systems Architecture ; Volume 59, Issue 7 , August , 2013 , Pages 468-481 ; 13837621 (ISSN) ; Asadinia, M ; Sarbazi Azad, H ; Sharif University of Technology
2013
Abstract
In this paper, a processor allocation mechanism for NoC-based chip multiprocessors is presented. Processor allocation is a well-known problem in parallel computer systems and aims to allocate the processing nodes of a multiprocessor to different tasks of an input application at run time. The proposed mechanism targets optimizing the on-chip communication power/latency and relies on two procedures: processor allocation and task migration. Allocation is done by a fast heuristic algorithm to allocate the free processors to the tasks of an incoming application when a new application begins execution. The task-migration algorithm is activated when some application completes execution and frees up...
A game theoretical thermal - aware run-time task synchronization method for multiprocessor systems-on-chip
, Article Proceedings - 15th Euromicro Conference on Digital System Design, DSD 2012 ; Article number 6386970 , 5 -8 September , 2012 , pp. 759-765 ; ISBN: 9780769547985 ; Khabbazian, M. H ; Modarressi, M ; Sarbazi Azad, H ; Sharif University of Technology
2012
Abstract
This paper presents a distributed run-time task synchronization method for multicore processors aiming to reduce the average power consumption of the chip and satisfy a given thermal constraint, while imposing no performance overhead. Being built on the game theory concepts, this is achieved by dynamically changing the frequency of each individual core based on its current workload iteratively until converging to an optimal point. In this work we target two thermal constraints: keeping (1) the core peak temperature and, (2) thermal gradient across the cores below a predefined threshold. The results show that the proposed framework can find the appropriate frequency for each core based on the...
An energy-efficient virtual channel power-gating mechanism for on-chip networks
, Article Proceedings -Design, Automation and Test in Europe, DATE, 9 March 2015 through 13 March 2015 ; Volume 2015-April , March , 2015 , Pages 1527-1532 ; 15301591 (ISSN) ; 9783981537048 (ISBN) ; Sadrosadati, M ; Fakhrzadehgan, A ; Modarressi, M ; Sarbazi Azad, H ; Sharif University of Technology
Institute of Electrical and Electronics Engineers Inc
2015
Abstract
Power-gating is a promising method for reducing the leakage power of digital systems. In this paper, we propose a novel power-gating scheme for virtual channels in on-chip networks that uses an adaptive method to dynamically adjust the number of active VCs based on the on-chip traffic characteristics. Since virtual channels are used to provide higher throughput under high traffic loads, our method sets the number of virtual channel at each port selectively based on the workload demand, thereby do not negatively affect performance. Evaluation results show that by using this scheme, about 40% average reduction in static power consumption can be achieved with negligible performance overhead
You are what you eat: Sequence analysis reveals how plant microRNAs may regulate the human genome
, Article Computers in Biology and Medicine ; Volume 106 , 2019 , Pages 106-113 ; 00104825 (ISSN) ; Hasani Bidgoli, M ; Motahari, S. A ; Sedaghat, N ; Modarressi, M. H ; Sharif University of Technology
Elsevier Ltd
2019
Abstract
Background: Nutrigenomic has revolutionized our understanding of nutrition. As plants make up a noticeable part of our diet, in the present study we chose microRNAs of edible plants and investigated if they can perfectly match human genes, indicating potential regulatory functionalities. Methods: miRNAs were obtained using the PNRD database. Edible plants were separated and microRNAs in common in at least four of them entered our analysis. Using vmatchPattern, these 64 miRNAs went through four steps of refinement to improve target prediction: Alignment with the whole genome (2581 results), filtered for those in gene regions (1371 results), filtered for exon regions (66 results) and finally...
A reconfigurable network-on-chip architecture for heterogeneous CMPs in the dark-silicon era
, Article Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors ; 18-20 June , 2014 , pp. 76-77 ; ISSN: 10636862 ; ISBN: 9781479936090 ; Sarbazi Azad, H ; Sharif University of Technology
2014
Abstract
Core specialization is a promising solution to the dark silicon challenge. This approach trades off the cheaper silicon area with energy-efficiency by integrating a selection of many diverse application-specific cores into a single billion-transistor multicore chip. Each application then activates the subset of cores that best matches its processing requirements. These cores act as a customized application-specific CMP for the application. Such an arrangement of cores requires some special on-chip inter-core communication treatment to efficiently connect active cores. In this paper, we propose a reconfigurable network-on-chip that leverages the routers of the dark portion of the chip to...
Reconfigurable cluster-based networks-on-chip for application-specific MPSoCs
, Article Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors ; 9-11 July , 2012 , pp. 153-156 ; ISSN: 10636862 ; ISBN: 9780769547688 ; Sarbazi-Azad, H ; Sharif University of Technology
2012
Abstract
In this paper, we propose a reconfigurable NoC in which a customized topology for a given application can be implemented. In this NoC, the nodes are grouped into some clusters interconnected by a reconfigurable communication infrastructure. The nodes inside a cluster are connected by a fixed topology. From the traffic management perspective, this structure benefits from the interesting characteristics of the mesh topology (efficient handling of local traffic where each node communicates with its neighbors), while avoids its drawbacks (the lack of short paths between remotely located nodes). We then present a design flow that maps the frequently communicating tasks of a given application into...
A high-performance and low-power on-chip network with reconfigurable topology
, Article Dynamic Reconfigurable Network-on-Chip Design: Innovations for Computational Processing and Communication ; 2010 , Pages 309-329 ; 9781615208074 (ISBN) ; Sarbazi Azad, H ; Sharif University of Technology
2010
Abstract
In this chapter, we present a reconfigurable architecture for network-on-chips (NoC) on which arbitrary application-specific topologies can be implemented. The proposed NoC can dynamically tailor its topology to the traffic pattern of different applications, aiming to address one of the main drawbacks of existing application-specific NoC optimization methods, i.e. optimizing NoCs based on the traffic pattern of a single application. Supporting multiple applications is a critical feature of an NoC as several different applications are integrated into the modern and complex multi-core system-on-chips and chip multiprocessors and an NoC that is designed to run exactly one application does not...
Leveraging dark silicon to optimize networks-on-chip topology
, Article Journal of Supercomputing ; Volume 71, Issue 9 , 2015 , Pages 3549-3566 ; 09208542 (ISSN) ; Sarbazi-Azad, H ; Sharif University of Technology
Kluwer Academic Publishers
2015
Abstract
This paper presents a reconfigurable network-on-chip (NoC) for many-core chip multiprocessors (CMPs) in the dark silicon era, where a considerable part of high-end chips cannot be powered up due to the power and bandwidth walls. Core specialization, which trades off the cheaper silicon area with energy-efficiency, is a promising solution to the dark silicon challenge. This approach integrates a selection of many diverse application-specific cores into a single many-core chip. Each application then activates those cores that best match its processing requirements. Since active cores may not always form a contiguous active region in the chip, such a partially active many-core CMP requires some...
Topology specialization for networks-on-chip in the dark silicon era
, Article Advances in Computers ; Volume 110 , 2018 , Pages 217-258 ; 00652458 (ISSN); 9780128153581 (ISBN) ; Sarbazi Azad, H ; Sharif University of Technology
Academic Press Inc
2018
Abstract
Following Moore's law, the number of transistors on chip has grown exponentially for decades. This growing transistor count, coupled with recent architecture and compiler advances, has resulted in an unprecedented exponential performance increase of computers. With the end of Dennard scaling, however, the power required to operate all transistors at the full performance level simultaneously grows across the technology generations. Consequently, chips will keep an increasing fraction of transistors power gated or dark to remain within the power envelope. The power-gated part of the chip, known as dark silicon, is expected to comprise a significant portion of the die real estate in new...
Power-aware mapping for reconfigurable NoC architectures
, Article 2007 IEEE International Conference on Computer Design, ICCD 2007, Lake Tahoe, CA, 7 October 2007 through 10 October 2007 ; 2007 , Pages 417-422 ; 1424412587 (ISBN); 9781424412587 (ISBN) ; Sarbazi Azad, H ; Sharif University of Technology
2007
Abstract
A core mapping method for reconfigurable network-on-chip (NoC) architectures is presented in this paper. In most of the existing methods, mapping is carried out based on the traffic characteristics of a single application. However, several different applications are implemented and integrated in the modern complex system-on-chips which should be considered by mapping methods. In the proposed method, the reconfiguration (which is achieved by embedding programmable switches between routers of a mesh-based NoC) allows us to dynamically change the network topology in order to adapt it with the running application and optimize the power and performance metrics. The presented network architecture...
Parallel 3-dimensional DCT computation on k-Ary n-cubes
, Article 8th International Conference on High-Performance Computing in Asia-Pacific Region, HPC Asia 2005, Beijing, 30 November 2005 through 3 December 2005 ; Volume 2005 , 2005 , Pages 91-97 ; 0769524869 (ISBN); 9780769524863 (ISBN) ; Sarbazi Azad, H ; Sharif University of Technology
2005
Abstract
The three dimensional discrete cosine transform (3D DCT) has been widely used in many applications such as video compression. On the other hand, the kary n-cube is one of the most popular interconnection networks used in many recent multicomputers. As direct calculation of 3D DCT is very time consuming, many researchers have been working on developing algorithms and special-purpose architectures for fast computation of 3D DCT. This paper proposes a parallel algorithm for efficient calculation of 3D DCT on the k-ary n-cube multicomputers. The time complexity of the proposed algorithm is of O(N) for an N × N × N input data cube while direct calculation of 3D DCT has a complexity of O(N6). ©...
Traffic-load-aware virtual channel power-gating in network-on-chips
, Article Advances in Computers ; 2021 ; 00652458 (ISSN) ; Mirhosseini, A ; Akbarzadeh, N ; Modarressi, M ; Sarbazi Azad, H ; Sharif University of Technology
Academic Press Inc
2021
Abstract
Network-on-Chips (NoCs) employ several virtual channels per input port to mitigate head-of-line blocking issue in transmitting network packets. Unfortunately, these virtual channels are power-hungry resources that significantly contribute to the total power consumption of NoCs. In particular, we make the key observation that even in high load traffic, a number of virtual channels are idle, imposing significant static power overhead. Prior works use power-gating technique to switch off idle VCs and reduce the static power consumption. However, we observe that prior works are mostly suitable for low traffic loads and are ineffective in high traffic loads. In this chapter, we aim to propose a...
Traffic-load-aware virtual channel power-gating in network-on-chips
, Article Advances in Computers ; Volume 124 , 2022 , Pages 1-19 ; 00652458 (ISSN); 9780323856881 (ISBN) ; Mirhosseini, A ; Akbarzadeh, N ; Modarressi, M ; Sarbazi Azad, H ; Sharif University of Technology
Academic Press Inc
2022
Abstract
Network-on-Chips (NoCs) employ several virtual channels per input port to mitigate head-of-line blocking issue in transmitting network packets. Unfortunately, these virtual channels are power-hungry resources that significantly contribute to the total power consumption of NoCs. In particular, we make the key observation that even in high load traffic, a number of virtual channels are idle, imposing significant static power overhead. Prior works use power-gating technique to switch off idle VCs and reduce the static power consumption. However, we observe that prior works are mostly suitable for low traffic loads and are ineffective in high traffic loads. In this chapter, we aim to propose a...
Author Correction: A novel variant in TLE6 is associated with embryonic developmental arrest (EDA) in familial female infertility (Scientific Reports, (2022), 12, 1, (17664), 10.1038/s41598-022-22687-y)
, Article Scientific Reports ; Volume 12, Issue 1 , 2022 ; 20452322 (ISSN) ; Mohebi, M ; Berjis, K ; Ghahremani, A ; Modarressi, M. H ; Ghafouri Fard, S ; Sharif University of Technology
Nature Research
2022
Abstract
The original version of this Article contained an error in the spelling of the author Mohammad Hossein Modarressi, which was incorrectly given as Mohammad‑Hossein Modarresi. The original Article has been corrected. © 2022, The Author(s)
A High-Performance and Low-Power Reconfigurable Network-on-Chip Architecture
,
Ph.D. Dissertation
Sharif University of Technology
;
Sarbazi Azad, Hamid
(Supervisor)
Abstract
Network-on-Chip (NoC) is a promising on-chip communication paradigm which targets the scalability and predictability problems of the traditional on-chip mechanisms. However, it has been shown that, in future technologies (especially 22 nm technology), the power consumption of the current NoCs is about 10 times higher than the power budget can be devoted to them. Application-specific optimization is one of the most effective approaches to bridge the exiting gap between the current and the ideal NoC power consumptions. However, almost all existing application-specific customization methods try to customize NoCs for...