Loading...
Search for: multiprocessing-systems
0.005 seconds
Total 44 records

    Topological properties of stretched graphs

    , Article IEEE International Conference on Computer Systems and Applications, 2006, Sharjah, 8 March 2006 through 8 March 2006 ; Volume 2006 , 2006 , Pages 647-650 ; 1424402123 (ISBN); 9781424402120 (ISBN) Shareghi, P ; Sarbazi Azad, H ; Sharif University of Technology
    IEEE Computer Society  2006
    Abstract
    We study a class of interconnection networks for multiprocessors, called the Stretched-G network, which is based on the base graph G by replacing each edge of the base network with an array of processors. Two interesting features of the proposed topology are its area-efficient VLSI layout and superior scalability over the underlying base network while preserving most of its desirable properties. We conduct a general study on the topological properties of stretched networks. We first obtain their basic topological parameters, after that we present an optimal routing algorithm. We also present a unified approach to obtain the topological properties and the VLSI-layout of an arbitrary stretched... 

    Hierarchical binary set partitioning in cache memories

    , Article Journal of Supercomputing ; Volume 31, Issue 2 , 2005 , Pages 185-202 ; 09208542 (ISSN) Zarandi, H. R ; Sarbazi Azad, H ; Sharif University of Technology
    2005
    Abstract
    In this paper, a new cache placement scheme is proposed to achieve higher hit ratios with respect to the two conventional schemes namely set-associative and direct mapping. Similar to set-associative, in this scheme, cache space is divided into sets of different sizes. Hence, the length of tag fields associated to each set is also variable and depends on the partition it is in. The proposed mapping function has been simulated with some standard trace files and statistics are gathered and analyzed for different cache configurations. The results reveal that the proposed scheme exhibits a higher hit ratio compared to the two well-known mapping schemes, namely set-associative and direct mapping,... 

    Using task migration to improve non-contiguous processor allocation in NoC-based CMPs

    , Article Journal of Systems Architecture ; Vol. 59, issue. 7 , 2013 , pp. 468-481 ; ISSN: 13837621 Modarressi, M ; Asadinia, M ; Sarbazi-Azad, H ; Sharif University of Technology
    Abstract
    In this paper, a processor allocation mechanism for NoC-based chip multiprocessors is presented. Processor allocation is a well-known problem in parallel computer systems and aims to allocate the processing nodes of a multiprocessor to different tasks of an input application at run time. The proposed mechanism targets optimizing the on-chip communication power/latency and relies on two procedures: processor allocation and task migration. Allocation is done by a fast heuristic algorithm to allocate the free processors to the tasks of an incoming application when a new application begins execution. The task-migration algorithm is activated when some application completes execution and frees up... 

    Task migration in three-dimensional meshes

    , Article Journal of Supercomputing ; Vol. 56, issue. 3 , 2011 , p. 328-352 ; ISSN: 09208542 Bargi, A ; Sarbazi-Azad, H ; Sharif University of Technology
    Abstract
    As a result of the emerging use of mesh-based multicomputers (and recently mesh-based multiprocessor systems-on-chip), issues related to processor management have attracted much attention. In a mesh-based multiprocessor, after repeated submesh allocations and de-allocations, the system network may be fragmented, i.e. there might be unallocated nodes in the network. As a result, in a system with contiguous processor allocation, no new tasks can start running due to the lack of enough free adjacent processors to form a suitable submesh. Although there might be enough free processors available, they remain idle until the allocator can find a set of adjacent free nodes forming a submesh to be... 

    Time-scalable mapping for circuit-switched GALS chip multiprocessor platforms

    , Article IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems ; Vol. 33, issue. 5 , May , 2014 , p. 752-762 Foroozannejad, M. H ; Hashemi, M ; Mahini, A ; Baas, B. M ; Ghiasi, S ; Sharif University of Technology
    Abstract
    We study the problem of mapping concurrent tasks of an application to cores of a chip multiprocessor that utilize circuit-switched interconnect and global asynchronous local synchronous (GALS) clocking domains. We develop a configurable algorithm that naturally handles a number of practical requirements, such as architectural features of the target platform, core failures, and hardware accelerators, and in addition, is scalable to a large number of tasks and cores. Experiments with several real life applications show that our algorithm outperforms manual mapping, integer linear programming-based mapping after ten days of solver run time, and a recent packet-switched network on chip-based... 

    Using task migration to improve non-contiguous processor allocation in NoC-based CMPs

    , Article Journal of Systems Architecture ; Volume 59, Issue 7 , August , 2013 , Pages 468-481 ; 13837621 (ISSN) Modarressi, M ; Asadinia, M ; Sarbazi Azad, H ; Sharif University of Technology
    2013
    Abstract
    In this paper, a processor allocation mechanism for NoC-based chip multiprocessors is presented. Processor allocation is a well-known problem in parallel computer systems and aims to allocate the processing nodes of a multiprocessor to different tasks of an input application at run time. The proposed mechanism targets optimizing the on-chip communication power/latency and relies on two procedures: processor allocation and task migration. Allocation is done by a fast heuristic algorithm to allocate the free processors to the tasks of an incoming application when a new application begins execution. The task-migration algorithm is activated when some application completes execution and frees up... 

    Simultaneous variation-aware architecture exploration and task scheduling for MPSoC energy minimization

    , Article Proceedings of the ACM Great Lakes Symposium on VLSI, GLSVLSI ; 2011 , Pages 271-276 ; 9781450306676 (ISBN) Momtazpour, M ; Ghorbani, M ; Goudarzi, M ; Sanaei, E
    Abstract
    In nanometer-scale process technologies, the effects of process variations are observed in Multiprocessor System-on-Chips (MPSoC) in terms of variations in frequencies and leakage powers among the processors on the same chip as well as across different chips of the same design. Traditionally, worst-case values are assumed for these parameters and then a deterministic optimization technique is applied to the MPSoC application under design. We show that such worst-case-based approaches are not optimal with the increasing variation observed at system-level, and instead, statistical approaches should be employed. We consider the problem of simultaneously choosing MPSoC architecture and task... 

    Task migration in three-dimensional meshes

    , Article Journal of Supercomputing ; Volume 56, Issue 3 , March , 2011 , Pages 328-352 ; 09208542 (ISSN) Bargi, A ; Sarbazi Azad, H ; Sharif University of Technology
    2011
    Abstract
    As a result of the emerging use of mesh-based multicomputers (and recently mesh-based multiprocessor systems-on-chip), issues related to processor management have attracted much attention. In a mesh-based multiprocessor, after repeated submesh allocations and de-allocations, the system network may be fragmented, i.e. there might be unallocated nodes in the network. As a result, in a system with contiguous processor allocation, no new tasks can start running due to the lack of enough free adjacent processors to form a suitable submesh. Although there might be enough free processors available, they remain idle until the allocator can find a set of adjacent free nodes forming a submesh to be... 

    Helia: Heterogeneous interconnect for low resolution cache access in snoop-based chip multiprocessors

    , Article 28th IEEE International Conference on Computer Design, ICCD 2010, Amsterdam, 3 October 2010 through 6 October 2010 ; 2010 , Pages 84-91 ; 10636404 (ISSN) ; 9781424489350 (ISBN) Shafiee, A ; Shahidi, N ; Baniasad, A ; Sharif University of Technology
    2010
    Abstract
    In this work we introduce Heterogeneous Interconnect for Low Resolution Cache Access (Helia). Helia improves energy efficiency in snoop-based chip multiprocessors as it eliminates unnecessary activities in both interconnect and cache. This is achieved by using innovative snoop filtering mechanisms coupled with wire management techniques. Our optimizations rely on the observation that a high percentage of cache mismatches could be detected by utilizing a small subset but highly informative portion of the tag bits. Helia relies on the snoop controller to detect possible remote tag mismatches prior to tag array lookup. Power is reduced as a) our wire management techniques permit slow... 

    Write invalidation analysis in chip multiprocessors

    , Article Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9 September 2009 through 11 September 2009, Delft ; Volume 5953 LNCS , 2010 , Pages 196-205 ; 03029743 (ISSN) ; 3642118011 (ISBN) Ardalani, N ; Baniasadi, A ; Sharif University of Technology
    2010
    Abstract
    Chip multiprocessors (CMPs) issue write invalidations (WIs) to assure program correctness. In conventional snoop-based protocols, writers broadcast invalidations to all nodes as soon as possible. In this work we show that this approach, while protecting correctness, is inefficient due to two reasons. First, many of the invalidated blocks are not accessed after invalidation making the invalidation unnecessary. Second, among the invalidated blocks many are not accessed anytime soon, making immediate invalidation unnecessary. While invalidating the first group could be avoided altogether, the second group's invalidation could be delayed without any performance or correctness cost. Accordingly,... 

    Broadcast algorithms on OTIS-Cubes

    , Article 2008 International Symposium on Parallel and Distributed Processing with Applications, ISPA 2008, Sydney, NSW, 10 December 2008 through 12 December 2008 ; December , 2008 , Pages 637-642 ; 9780769534718 (ISBN) Ebrahimi Kahaki, H ; Sarbazi Azad, H ; Sharif University of Technology
    2008
    Abstract
    OTIS-based architectures appear to have the potential to be an interesting option for future generations of multiprocessing systems. In this paper, we propose a new adaptive unicast routing algorithm and four software-based (unicast-based) broadcast algorithms for the wormhole switched OTI S-hypercube. We then present an empirical performance evaluation of these algorithms in OTIS-hypercube for different topologies, message length and traffic loads. © 2008 IEEE  

    Scheduling to minimize gaps and power consumption

    , Article SPAA'07: 19th Annual Symposium on Parallelism in Algorithms and Architectures, San Diego, CA, 9 June 2007 through 11 June 2007 ; 2007 , Pages 46-54 ; 159593667X (ISBN); 9781595936677 (ISBN) Demaine, E.D ; Ghodsi, M ; Hajiaghayi, M. T ; Sayedi Roshkhar, A. S ; Zadimoghaddam, M ; Sharif University of Technology
    2007
    Abstract
    This paper considers scheduling tasks while minimizing the power consumption of one or more processors, each of which can go to sleep at a fixed cost α. There are two natural versions of this problem, both considered extensively in recent work: minimize the total power consumption (including computation time), or minimize the number of "gaps" in execution. For both versions in a multiprocessor system, we develop a polynomial-time algorithm based on sophisticated dynamic programming. In a generalization of the power-saving problem, where each task can execute in any of a specified set of time intervals, we develop a (1 + 23 α)-approximation, and show that dependence on α is necessary.... 

    Characterization of spatial fault patterns in interconnection networks

    , Article Parallel Computing ; Volume 32, Issue 11-12 , 2006 , Pages 886-901 ; 01678191 (ISSN) Hoseiny Farahabady, M ; Safaei, F ; Khonsari, A ; Fathy, M ; Sharif University of Technology
    2006
    Abstract
    Parallel computers, such as multiprocessors system-on-chip (Mp-SoCs), multicomputers and cluster computers, are consisting of hundreds or thousands multiple processing units and components (such as routers, channels and connectors) connected via some interconnection network that collectively may undergo high failure rates. Therefore, these systems are required to be equipped with fault-tolerant mechanisms to ensure that the system will keep running in a degraded mode. Normally, the faulty components are coalesced into fault regions, which are classified into two major categories: convex and concave regions. In this paper, we propose the first solution to calculate the probability of... 

    Performance evaluation of fault-tolerant scheduling algorithms in real-time multiprocessor systems

    , Article IASTED International Conference on Parallel and Distributed Computing and Networks, as part of the 23rd IASTED International Multi-Conference on Applied Informatics, Innsbruck, 15 February 2005 through 17 February 2005 ; 2005 , Pages 479-484 ; 10272666 (ISSN) Beitollahi, H ; Miremadi, S. G ; Fahringer T ; Hamza M. H ; Sharif University of Technology
    2005
    Abstract
    This paper presents the performance analysis of several best-known partitioning scheduling algorithms in real-time and fault-tolerant multiprocessor systems. To do this, multiple versions of tasks are executed on different processors. Both static and dynamic scheduling algorithms are analyzed. In the case of static scheduling algorithms, rate-monotonic (RM) scheduling policy is considered. In the dynamic scheduling algorithms, the scheduling policies are rate-monotonic and earliest-deadline-first (EDF). Partitioning scheduling algorithms which are studied here are heuristic algorithms that are formed by combining any of the bin-packing algorithms with any of the schedulability conditions for... 

    High-endurance and performance-efficient design of hybrid cache architectures through adaptive line replacement

    , Article Proceedings of the International Symposium on Low Power Electronics and Design ; 2011 , p. 79-84 ; ISSN: 15334678 ; ISBN: 9781612846590 Jadidi, A ; Arjomand, M ; Sarbazi-Azad, H ; Sharif University of Technology
    Abstract
    In this paper, we propose a run-time strategy for managing writes onto last level cache in chip multiprocessors where STT-RAM memory is used as baseline technology. To this end, we assume that each cache set is decomposed into limited SRAM lines and large number of STT-RAM lines. SRAM lines are target of frequently-written data and rarely-written or read-only ones are pushed into STT-RAM. As a novel contribution, a low-overhead, fully-hardware technique is utilized to detect write-intensive data blocks of working set and place them into SRAM lines while the remaining data blocks are candidates to be remapped onto STT-RAM blocks during system operation. Therefore, the achieved cache... 

    Virtual point-to-point connections for NoCs

    , Article IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems ; Vol. 29, issue. 6 , 2010 , p. 855-868 ; ISSN: 02780070 Modarressi, M ; Tavakkol, A ; Sarbazi-Azad, H ; Sharif University of Technology
    Abstract
    In this paper, we aim to improve the performance and power metrics of packet-switched network-on-chips (NoCs) and benefits from the scalability and resource utilization advantages of NoCs and superior communication performance of point-to-point dedicated links. The proposed method sets up the virtual point-to-point (VIP) connections over one virtual channel (which bypasses the entire router pipeline) at each physical channel of the NoC. We present two schemes for constructing such VIP circuits. In the first scheme, the circuits are constructed for an application based on its task-graph at design time. The second scheme addresses constructing the connections at run-time using a light-weight... 

    Throughput-memory footprint trade-off in synthesis of streaming software on embedded multiprocessors

    , Article Transactions on Embedded Computing Systems ; Volume 13, Issue 3 , December , 2013 ; 15399087 (ISSN) Hashemi, M ; Foroozannejad, M. H ; Ghiasi, S ; Sharif University of Technology
    2013
    Abstract
    We study the trade-off between throughput and memory footprint of embedded software that is synthesized from acyclic static dataflow (task graph) specifications targeting distributed memory multiprocessors. We identify iteration overlapping as a knob in the synthesis process by which one can trade application throughput for its memory requirement. Given an initial processor assignment and non-overlapped task schedule, we formally present underlying properties of the problem, such as constraints on a valid iteration overlapping, maximum possible throughput, and minimum memory footprint. Moreover, we develop an effective algorithm for generation of a rich set of design points that provide a... 

    Static statistical MPSoC power optimization by variation-aware task and communication scheduling

    , Article Microprocessors and Microsystems ; Volume 37, Issue 8 PART B , 2013 , Pages 953-963 ; 01419331 (ISSN) Momtazpour, M ; Goudarzi, M ; Sanaei, E ; Sharif University of Technology
    2013
    Abstract
    Corner-case analysis is a well-known technique to cope with occasional deviations occurring during the manufacturing process of semiconductors. However, the increasing amount of process variation in nanometer technologies has made it inevitable to move toward statistical analysis methods, instead of deterministic worst-case-based techniques, at all design levels. We show that by statically considering statistical effects of random and systematic process variation on performance and power consumption of a Multiprocessor System-on-Chip (MPSoC), significant power improvement can be achieved by static software-level optimizations such as task and communication scheduling. Moreover, we analyze... 

    High-endurance and performance-efficient design of hybrid cache architectures through adaptive line replacement

    , Article Proceedings of the International Symposium on Low Power Electronics and Design, 1 August 2011 through 3 August 2011 ; August , 2011 , Pages 79-84 ; 15334678 (ISSN) ; 9781612846590 (ISBN) Jadidi, A ; Arjomand, M ; SarbaziAzad, H ; Sharif University of Technology
    2011
    Abstract
    In this paper, we propose a run-time strategy for managing writes onto last level cache in chip multiprocessors where STT-RAM memory is used as baseline technology. To this end, we assume that each cache set is decomposed into limited SRAM lines and large number of STT-RAM lines. SRAM lines are target of frequently-written data and rarely-written or read-only ones are pushed into STT-RAM. As a novel contribution, a low-overhead, fully-hardware technique is utilized to detect write-intensive data blocks of working set and place them into SRAM lines while the remaining data blocks are candidates to be remapped onto STT-RAM blocks during system operation. Therefore, the achieved cache... 

    Yield-driven design-time task scheduling techniques for multi-processor system on chips under process variation: A comparative study

    , Article IET Computers and Digital Techniques ; Volume 9, Issue 4 , 2015 , Pages 221-229 ; 17518601 (ISSN) Momtazpour, M ; Assare, O ; Rahmati, N ; Boroumand, A ; Barati, S ; Goudarzi, M ; Sharif University of Technology
    Institution of Engineering and Technology  2015
    Abstract
    Process variation has already emerged as a major concern in design of multi-processor system on chips (MPSoC). In recent years, there have been several attempts to bring variability awareness into the task scheduling process of embedded MPSoCs to improve performance yield. This study attempts to provide a comparative study of the current variation-aware design-time task and communication scheduling techniques that target embedded MPSoCs. To this end, the authors first use a sign-off variability modelling framework to accurately estimate the frequency distribution of MPSoC components. The task scheduling methods are then compared in terms of both the quality of the final solution and the...