Search for: program-compilers
Software-level instruction-cache leakage reduction using value-dependence of SRAM leakage in nanometer technologies, Article Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) ; Volume 6590 , 2011 , Pages 275-299 ; 03029743 (ISSN); 9783642194474 (ISBN) ; Ishihara, T ; Noori, H ; Stenstrom P ; Sharif University of Technology
Within-die process variation is increasing in nanometer-scale process technologies. We observe that the same SRAM cell leaks differently under within-die process variations when storing 0 compared to 1; this difference can be up to 3 orders of magnitude at 60mV variation of threshold voltage (V th). Thus, leakage can be reduced if most often the values that dissipate less leakage are stored in the cache SRAM cells. We take advantage of this fact to reduce instruction-cache leakage by presenting three binary-optimization and software-level techniques: we (i) reorder instructions within basic-blocks so that their bits better match the less-leaky state of their corresponding cache cells, (ii)...
Article Theoretical Computer Science ; Volume 889 , 2021 , Pages 145-170 ; 03043975 (ISSN) ; Fiore, D ; Venturi, D ; Amini, M ; Sharif University of Technology
Elsevier B.V 2021
At SCN 2018, Fiore and Pagnin proposed a generic compiler (called “Matrioska”) allowing to transform sufficiently expressive single-key homomorphic signatures (SKHSs) into multi-key homomorphic signatures (MKHSs) under falsifiable assumptions in the standard model. Matrioska is designed for homomorphic signatures that support programs represented as circuits. The MKHS schemes obtained through Matrioska support the evaluation and verification of arbitrary circuits over data signed from multiple users, but they require the underlying SKHS scheme to work with circuits whose size is exponential in the number of users, and thus can only support a constant number of users. In this work, we propose...
Article IEEE Design and Test ; 2021 ; 21682356 (ISSN) ; Ejlali, A ; Sharif University of Technology
IEEE Computer Society 2021
Newly developed 3D die stacking technologies affords us the possibility to revisit the idea of Processing-in-Memory (PIM) as implementation hurdles are overcome. We now have the opportunity to offload the data intensive parts of our program to the PIM in form of kernels to be able to take advantage of the high internal bandwidth of the memory modules. Memory access latency and bandwidth are two major bottlenecks in today’s high-performance computers and new use-cases are moving faster than ever before towards this mode of computing. With new graph processing and neural network applications being developed every day, having a performance model of such systems helps in predicting the behavior...
Article Canadian Conference on Electrical and Computer Engineering ; 2011 , Pages 000763-000766 ; 08407789 (ISSN) ; 9781424497898 (ISBN) ; Goudarzi, M ; Sharif University of Technology
While performance and power consumption of processors present a classic trade-off in designing embedded hardware, software can be optimized in favor of both performance and energy. We evaluate the impact of optimizations at different stages of designing embedded software. We show that algorithm choice and compiler optimizations aimed at improving performance can also reduce energy consumption of an embedded processor. We also propose energy-aware compilation guidelines which can further reduce energy consumption without performance penalties. Our experimental results show that up to 85% energy reduction and 89% performance improvement can be achieved by these techniques
An energy-aware methodology for mapping and scheduling of concurrent applications in MPSoC architectures, Article 2011 19th Iranian Conference on Electrical Engineering, ICEE 2011, 17 May 2011 through 19 May 2011 ; May , 2011 , Page(s): 1 ; ISSN : 21647054 ; 9789644634284 (ISBN) ; Hessabi, S ; Vahdat, B. V ; Sharif University of Technology
Mapping and Scheduling are two central and critical steps in design flow of the Networks on Chips (NoCs). They deal with implementation of the applications on NoCs. In this paper a novel energy aware algorithm, called EAMS, for mapping and scheduling of concurrent applications to NoC platforms is developed. It is considered that, the NoC architecture consists of a set of heterogeneous IP cores. The introduced algorithm finds a mapping of the tasks of the application to available IP cores so that the overall energy consumption, meeting task deadlines, is minimized
Article IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, PIMRC, 30 August 2015 through 2 September 2015 ; Volume 2015-December , Sept , 2015 , Pages 1195-1200 ; 9781467367820 (ISBN) ; Zehni, M ; Pakravan, M. R ; Sharif University of Technology
Institute of Electrical and Electronics Engineers Inc 2015
Device-to-Device communication (D2D) integrated in cellular networks emerges as a new trend in response to notable rise in traffic demand. Resource allocation is one of the important challenges in deployment of D2D networks. In this paper, we formulate an optimization problem for optimal resource allocation and then propose a novel algorithm namely maximum clique based resource allocation (MCRA) for improving the spectral reuse based on graph theoretic concept of maximum clique. Practical application of D2D communications requires each node to receive and transmit signals during the communication process. We have considered this issue in constructing the interference graph and mathematical...
Article Proceedings of the ACM SIGPLAN Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES) ; 2012 , Pages 71-78 ; 9781450312127 (ISBN) ; Foroozannejad, M. H ; Ghiasi, S ; Etzel, C ; Sharif University of Technology
Variants of dataflow specification models are widely used to synthesize streaming applications for distributed-memory parallel processors. We argue that current practice of specifying streaming applications using rigid dataflow models, implicitly prohibits a number of platform oriented optimizations and hence limits portability and scalability with respect to number of processors. We motivate Functionally-cOnsistent stRucturally-MalLEabe Streaming Specification, dubbed FORMLESS, which refers to raising the abstraction level beyond fixed-structure dataflow to address its portability and scalability limitations. To demonstrate the potential of the idea, we develop a design space exploration...
Reimbursing the handshake overhead of asynchronous circuits using compiler pre-synthesis optimizations, Article 11th EUROMICRO Conference on Digital System Design Architectures, Methods and Tools, DSD 2008, Parma, 3 September 2008 through 5 September 2008 ; 2008 , Pages 290-297 ; 9780769532776 (ISBN) ; Mirza Aghatabar, M ; Najibi, M ; Pedram, H ; Sadeghi, A ; Sharif University of Technology
Asynchronous circuits have many advantages vs synchronous design styles like high performance and lower power consumption; however, there is a drawback of big overhead in handshake circuitry of these circuits. In this paper, we have reduced the amount of these extra circuits by take advantage of some compiler techniques. The compiler methods can be used innovatively to improve the synthesis results in terms of both power consumption and area, since these code motions lead to removing of completion detection and validity check parts of asynchronous designs. To the best of our knowledge this is the first effort in using the compiler pre-synthesis optimizations in asynchronous circuits to...
Implementation-aware model analysis: The case of buffer-throughput tradeoff in streaming applications, Article Proceedings of the ACM SIGPLAN Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), 18 June 2015 through 19 June 2015 ; Volume 2015-June , 2015 , Pages 108-117 ; 9781450332576 (ISBN) ; Hashemi, M ; Khibin, V ; Ghiasi, S ; Sharif University of Technology
Association for Computing Machinery 2015
Models of computation abstract away a number of implementation details in favor of well-defined semantics. While this has unquestionable benefits, we argue that analysis of models solely based on operational semantics (implementation oblivious analysis) is unfit to drive implementation design space exploration. Specifically, we study the tradeoff between buffer size and streaming throughput in applications modeled as synchronous data flow (SDF) graphs. We demonstrate the inherent inaccuracy of implementation-oblivious approach, which only considers SDF operational semantic. We propose a rigorous transformation, which equips the state of the art buffer-throughput tradeoff analysis technique...