Sharif Digital Repository / Sharif University of Technology / Search result

Power and frequency analysis for data and control independence in embedded processors

, Article 2011 International Green Computing Conference and Workshops, IGCC 2011, 25 July 2011 through 28 July 2011 ; July , 2011 , Page(s): 1 - 6 ; 9781457712203 (ISBN) Samie, F ; Baniasadi, A ; Sharif University of Technology

2011

Abstract

In this work we study control independence in embedded processors. We classify control independent instructions to data dependent and data independent and measure each group's frequency and behavior. Moreover, we study how control independent instructions impact power dissipation and resource utilization. We also investigate control independent instructions' behavior for different processors and branch predictors. Our study shows that data independent instructions account for 34% of the control independent instructions in the applications studied here. We also show that control independent instructions account for upto 12% of the processor energy and 15.6%, 11.2% and 8.6% of the instructions...

A classification of hadoop job schedulers based on performance optimization approaches

, Article Cluster Computing ; Volume 24, Issue 4 , 2021 , Pages 3381-3403 ; 13867857 (ISSN) Ghazali, R ; Adabi, S ; Down, D. G ; Movaghar, A ; Sharif University of Technology

Springer 2021

Abstract

Job scheduling in MapReduce plays a vital role in Hadoop performance. In recent years, many researchers have presented job scheduler algorithms to improve Hadoop performance. Designing a job scheduler that minimizes job execution time with maximum resource utilization is not a straightforward task. The primary purpose of this paper is to investigate agents affecting job scheduler efficiency and present a novel classification for job schedulers based on these factors. We provide a comprehensive overview of existing job schedulers in each group, evaluating their approaches, their effects on Hadoop performance, and comparing their advantages and disadvantages. Finally, we provide...

A Method to Improve Adaptivity of Odd-Even Routing Algorithm in Mesh NoCs

, Article 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP 2016, 17 February 2016 through 19 February 2016 ; 2016 , Pages 755-758 ; 9781467387750 (ISBN) Sadrosadati, M ; Bashizade, R ; Roozkhosh, S ; Shafiee, A ; Sarbazi Azad, H ; Sharif University of Technology

Institute of Electrical and Electronics Engineers Inc 2016

Abstract

Adaptive routing algorithms help balancing the resource utilization in different parts of the network and hence, prevent a resource becoming the performance bottleneck while other resources are still under-utilized. In this paper, we present a novel approach, called Preemptive Waiting, which is applied to Odd-Even routing algorithm (PWOE). PWOE postpones the saturation traffic rate of NoC by 13.4% compared to OE, under synthetic traffic loads. © 2016 IEEE

A multi-dimensional fairness combinatorial double-sided auction model in cloud environment

, Article 2016 8th International Symposium on Telecommunications, IST 2016, 27 September 2016 through 29 September 2016 ; 2017 , Pages 672-677 ; 9781509034345 (ISBN) Hassanzadeh, R ; Movaghar, A ; Hassanzadeh, H. R ; Sharif University of Technology

Institute of Electrical and Electronics Engineers Inc 2017

Abstract

In cloud investment markets, consumers are looking for the lowest cost and a desirable fairness while providers are looking for strategies to achieve the highest possible profit and return. Most existing models for auction-based resource allocation in cloud environments only consider the overall profit increase and ignore the profit of each participant individually or the difference between the rich and the poor participants. This paper proposes a multi-dimensional fairness combinatorial double auction (MDFCDA) model which strikes a balance between the revenue and the fairness among participants. We solve a winner determination problem (WDP) through integer programming which incorporates the...

Improving the performance of packet-switched networks-on-chip by SDM-based adaptive shortcut paths

, Article Integration, the VLSI Journal ; Volume 50 , 2015 , Pages 193-204 ; 01679260 (ISSN) Modarressi, M ; Teimouri, N ; Sarbazi Azad, H ; Sharif University of Technology

Elsevier 2015

Abstract

Abstract Reducing the NoC power is critical for scaling up the number of nodes in future many-core systems. Most NoC designs adopt packet-switching to benefit from its high throughput and excellent scalability. These benefits, however, come at the price of the power consumption and latency overheads of routers. Circuit-switching, on the other hand, enjoys a significant reduction in power and latency of communication by directing data over pre-established circuits, but the relatively large circuit setup time and low resource utilization of this switching mechanism is often prohibitive. In this paper, we address one of the major problems of circuit-switching, i.e. the circuit setup time...

Deep Private-feature extraction

, Article IEEE Transactions on Knowledge and Data Engineering ; Volume 32, Issue 1 , 2020 , Pages 54-66 Osia, S. A ; Taheri, A ; Shamsabadi, A. S ; Katevas, K ; Haddadi, H ; Rabiee, H. R ; Sharif University of Technology

IEEE Computer Society 2020

Abstract

We present and evaluate Deep Private-Feature Extractor (DPFE), a deep model which is trained and evaluated based on information theoretic constraints. Using the selective exchange of information between a user's device and a service provider, DPFE enables the user to prevent certain sensitive information from being shared with a service provider, while allowing them to extract approved information using their model. We introduce and utilize the log-rank privacy, a novel measure to assess the effectiveness of DPFE in removing sensitive information and compare different models based on their accuracy-privacy trade-off. We then implement and evaluate the performance of DPFE on smartphones to...

Modeling and evaluation of overlay generation problem for peer-assisted video adaptation and streaming

, Article Proceedings of the International Workshop on Network and Operating System Support for Digital Audio and Video, 28 May 2008 through 30 May 2008, Braunschweig ; 2008 , Pages 87-92 ; 9781605581576 (ISBN) Iqbal, R ; Hariri, B ; Shirmohammadi, S ; Sharif University of Technology

2008

Abstract

In this paper, we consider the problem of overlay generation for video adaptation and streaming applications in a way to efficiently utilize the bandwidth and computing power of the participating peers. Therefore, the proposed architecture performs regular streaming functions as well as video adaptation functions, moving the video contents adaptation computation load away from dedicated media-streaming/adaptation servers to the participating peers. To verify the performance of our design, we followed an analytical approach based on 0-1 Integer Linear Programming method to model the system and to æakulate the optimum overlay. The performance of our scheme is evaluated by simulations....

NURA: A framework for supporting non-uniform resource accesses in GPUs

, Article Performance Evaluation Review ; Volume 50, Issue 1 , 2022 , Pages 39-40 ; 01635999 (ISSN) Darabi, S ; Mahani, N ; Baxishi, H ; Yousefzadeh, E ; Sadrosadati, M ; Sarbazi Azad, H ; Sharif University of Technology

Association for Computing Machinery 2022

Abstract

Multi-application execution in Graphics Processing Units (GPUs), a promising way to utilize GPU resources, is still challenging. Some pieces of prior work (e.g. spatial multitasking) have limited opportunity to improve resource utilization, while others, e.g. simultaneous multi-kernel, provide fine-grained resource sharing at the price of unfair execution. This paper proposes a new multi-application paradigm for GPUs, called NURA, that provides high potential to improve resource utilization and ensure fairness and Quality-of-Service(QoS). The key idea is that each streaming multiprocessor (SM) executes the Cooperative Thread Arrays (CTAs) that belong to only one application (similar to...

NURA: A framework for supporting non-uniform resource accesses in GPUs

, Article 2022 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS/PERFORMANCE 2022, 6 June 2022 through 10 June 2022 ; 2022 , Pages 39-40 ; 9781450391412 (ISBN) Darabi, S ; Mahani, N ; Baxishi, H ; Yousefzadeh, E ; Sadrosadati, M ; Sarbazi Azad, H ; ACM SIGMETRICS ; Sharif University of Technology

Association for Computing Machinery, Inc 2022

Abstract

Multi-application execution in Graphics Processing Units (GPUs), a promising way to utilize GPU resources, is still challenging. Some pieces of prior work (e.g. spatial multitasking) have limited opportunity to improve resource utilization, while others, e.g. simultaneous multi-kernel, provide fine-grained resource sharing at the price of unfair execution. This paper proposes a new multi-application paradigm for GPUs, called NURA, that provides high potential to improve resource utilization and ensure fairness and Quality-of-Service(QoS). The key idea is that each streaming multiprocessor (SM) executes the Cooperative Thread Arrays (CTAs) that belong to only one application (similar to...

BOT-MICS: Bounding time using analytics in mixed-criticality systems

, Article IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems ; Volume 41, Issue 10 , 2022 , Pages 3239-3251 ; 02780070 (ISSN) Ranjbar, B ; Hosseinghorban, A ; Sahoo, S. S ; Ejlali, A ; Kumar, A ; Sharif University of Technology

Institute of Electrical and Electronics Engineers Inc 2022

Abstract

An increasing trend for reducing cost, space, and weight leads to modern embedded systems that execute multiple tasks with different criticality levels on a common hardware platform while guaranteeing a safe operation. In such mixed-criticality (MC) systems, multiple worst case execution times (WCETs) are defined for each task, corresponding to the system operation mode to improve the MC system's timing behavior at runtime. Determining the appropriate WCETs for lower criticality (LC) modes is nontrivial. On the one hand, considering a very low WCET for tasks can improve the processor utilization by scheduling more tasks in that mode, on the other hand, using a larger WCET ensures that the...

A low-latency low-power QR-decomposition ASIC implementation in 0.13 μm CMOS

, Article IEEE Transactions on Circuits and Systems I: Regular Papers ; Volume 60, Issue 2 , 2013 , Pages 327-340 ; 15498328 (ISSN) Shabany, M ; Patel, D ; Gulak, P. G ; Sharif University of Technology

2013

Abstract

This paper presents a hybrid QR decomposition (QRD) design that reduces the number of computations and increases their execution parallelism by using a unique combination of Multi-dimensional Givens rotations, Householder transformations and conventional 2-D Givens rotations. A semi-pipelined semi-iterative architecture is presented for the QRD core, that uses innovative design ideas to develop 2-D, Householder 3-D and 4-D/2-D configurable CORDIC processors, such that they can perform the maximum possible number of vectoring and rotation operations within the given number of cycles, while minimizing gate count and maximizing the resource utilization. Test results for the 0.3 mm 2 QRD chip,...

Market-based grid resource allocation using new negotiation model

, Article Journal of Network and Computer Applications ; Volume 36, Issue 1 , 2013 , Pages 543-565 ; 10848045 (ISSN) Adabi, S ; Movaghar, A ; Rahmani, A. M ; Beigy, H ; Sharif University of Technology

2013

Abstract

This paper presents a new negotiation model for designing Market- and Behavior-driven Negotiation Agents (MBDNAs) that address computational grid resource allocation problem. To determine the amount of concession for each trading cycle, the MBDNAs are guided by six factors: (1) number of negotiator's trading partners, (2) number of negotiator's competitors, (3) negotiator's time preference, (4) flexibility in negotiator's trading partner's proposal, (5) negotiator's proposal deviation from the average of its trading partners proposals, and (6) previous concession behavior of negotiator's trading partner. In our experiments, we compare grid resource consumer (GRC) of type MBDNAs (respectively...

A self-tuning controller for queuing delay regulation in TCP/AQM networks

, Article Telecommunication Systems ; 2018 ; 10184864 (ISSN) Kahe, G ; Jahangir, A. H ; Sharif University of Technology

Springer New York LLC 2018

Abstract

AQM router aims primarily to control the network congestion through marking/dropping packets which are used as congestion feedback in traffic sources to balance their flow rate. However, stabilizing queuing delay and maximizing link utilization have been considered as the main control objectives, especially in media dominated networks. Usually, most of the AQM algorithms are designed for a nominal operating point. However, time-varying nature of network parameters frequently violates their robustness bounds. In this paper, a self-tuning compensated PID controller is proposed to address the time-varying nature of network conditions caused by parameter variations and unresponsive connections....

DuCNoC: a high-throughput FPGA-based NoC simulator using dual-clock lightweight router micro-architecture

, Article IEEE Transactions on Computers ; Volume 67, Issue 2 , February , 2018 , Pages 208-221 ; 00189340 (ISSN) Mardani Kamali, H ; Zamiri Azar, K ; Hessabi, S ; Sharif University of Technology

IEEE Computer Society 2018

Abstract

On-chip interconnections play an important role in multi/many-processor systems-on-chip (MPSoCs). In order to achieve efficient optimization, each specific application must utilize a specific architecture, and consequently a specific interconnection network. For design space exploration and finding the best NoC solution for each specific application, a fast and flexible NoC simulator is necessary, especially for large design spaces. In this paper, we present an FPGA-based NoC co-simulator, which is able to be configured via software. In our proposed NoC simulator, entitled DuCNoC, we implement a Dual-Clock router micro-architecture, which demonstrates 75x-350x speed-up against BOOKSIM....

Deep private-feature extraction

, Article IEEE Transactions on Knowledge and Data Engineering ; 2018 ; 10414347 (ISSN) Osia, S. A ; Taheri, A ; Shamsabadi, A. S ; Katevas, M ; Haddadi, H ; Rabiee, H. R. R ; Sharif University of Technology

IEEE Computer Society 2018

Abstract

We present and evaluate Deep Private-Feature Extractor (DPFE), a deep model which is trained and evaluated based on information theoretic constraints. Using the selective exchange of information between a user's device and a service provider, DPFE enables the user to prevent certain sensitive information from being shared with a service provider, while allowing them to extract approved information using their model. We introduce and utilize the log-rank privacy, a novel measure to assess the effectiveness of DPFE in removing sensitive information and compare different models based on their accuracy-privacy trade-off. We then implement and evaluate the performance of DPFE on smartphones to...

Performance and power efficient on-chip communication using adaptive virtual point-to-point connections

, Article 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip, NoCS 2009, San Diego, CA, 10 May 2009 through 13 May 2009 ; 2009 , Pages 203-212 ; 9781424441433 (ISBN) Modarressi, M ; Sarbazi Azad, H ; Tavakkol, A ; IEEE Circuits and Systems Society; Council for EDA; ACM Special Interest Group on Computer Architecture (SIGARCH); ACM Special Interest Group on Embedded Systems (SIGBED); ACM Special Interest Group on Design Automation (SIGDA); Silistix, Inc ; Sharif University of Technology

2009

Abstract

In this paper, we propose a packet-switched network-on-chip (NoC) architecture which can provide a number of low-power, low-latency virtual point-to-point connections for communication flows. The work aims to improve the power and performance metrics of packet-switched NoC architectures and benefits from the power and resource utilization advantages of NoCs and superior communication performance of point-to-point dedicated links. The virtual point-to-point connections are set up by bypassing the entire router pipeline stages of the intermediate nodes. This work addresses constructing the virtual point-to-point connections at run-time using a light-weight setup network. It involves monitoring...

An analytical performance evaluation for WSNs using loop-free bellman ford protocol

, Article 2009 International Conference on Advanced Information Networking and Applications, AINA 2009, Bradford, 26 May 2009 through 29 May 2009 ; 2009 , Pages 568-571 ; 1550445X (ISSN); 9780769536385 (ISBN) Baharloo, M ; Hajisheykhi, R ; Arjomand, M ; Jahangir, A. H ; IEEE Computer Society ; Sharif University of Technology

2009

Abstract

Although several analytical models have been proposed for wireless sensor networks (WSNs) with different capabilities, very few of them consider the effect of general service distribution as well as design constraints on network performance. This paper presents a new analytical model to compute message latency in a WSN with loop-free Bellman Ford routing strategy. The model considers limited buffer size for each node using M/G/1/k queuing system. Also, contention probability and resource utilization are suitably modeled. The results obtained from simulation experiments confirm that the model exhibits a high degree of accuracy for various network configurations. © 2009 IEEE

Welfare-aware strategic demand control in an intelligent market-based framework: Move towards sustainable smart grid

, Article Applied Energy ; Volume 251 , 2019 ; 03062619 (ISSN) Taheri Tehrani, M ; Afshin Hemmatyar, A. M ; Sharif University of Technology

Elsevier Ltd 2019

Abstract

To address sustainability challenges appeared in today's power grids, it is essential for emerging demand control paradigm to be adapted more to the lifestyle of the customers. In this paper, due to the ever-growing interconnectivity of the grids, a distributed Commodity Market (CM) framework is proposed in which intelligent agents embedded inside of customers want to maximize their preferred welfare through real-time demand of power from an energy market. Since there is not a comprehensive model for the grids, utilizing Reinforcement Learning (RL) proves that the global optimal performance is achieved in the Nash Equilibrium (NE) of the proposed framework. This solution not only maximizes...

A self-tuning controller for queuing delay regulation in TCP/AQM networks

, Article Telecommunication Systems ; Volume 71, Issue 2 , 2019 , Pages 215-229 ; 10184864 (ISSN) Kahe, G ; Jahangir, A. H ; Sharif University of Technology

Springer New York LLC 2019

Abstract

AQM router aims primarily to control the network congestion through marking/dropping packets which are used as congestion feedback in traffic sources to balance their flow rate. However, stabilizing queuing delay and maximizing link utilization have been considered as the main control objectives, especially in media dominated networks. Usually, most of the AQM algorithms are designed for a nominal operating point. However, time-varying nature of network parameters frequently violates their robustness bounds. In this paper, a self-tuning compensated PID controller is proposed to address the time-varying nature of network conditions caused by parameter variations and unresponsive connections....

NURA: A framework for supporting non-uniform resource accesses in gpus

, Article Proceedings of the ACM on Measurement and Analysis of Computing Systems ; Volume 6, Issue 1 , 2022 ; 24761249 (ISSN) Darabi, S ; Mahani, N ; Baxishi, H ; Yousefzadeh Asl Miandoab, E ; Sadrosadati, M ; Sarbazi Azad, H ; Sharif University of Technology

Association for Computing Machinery 2022

Abstract

Multi-application execution in Graphics Processing Units (GPUs), a promising way to utilize GPU resources, is still challenging. Some pieces of prior work (e.g., spatial multitasking) have limited opportunity to improve resource utilization, while other works, e.g., simultaneous multi-kernel, provide fine-grained resource sharing at the price of unfair execution. This paper proposes a new multi-application paradigm for GPUs, called NURA, that provides high potential to improve resource utilization and ensures fairness and Quality-of-Service (QoS). The key idea is that each streaming multiprocessor (SM) executes Cooperative Thread Arrays (CTAs) belong to only one application (similar to the...