Loading...
Search for: big-data
0.006 seconds
Total 82 records

    DV-DVFS: merging data variety and DVFS technique to manage the energy consumption of big data processing

    , Article Journal of Big Data ; Volume 8, Issue 1 , 2021 ; 21961115 (ISSN) Ahmadvand, H ; Foroutan, F ; Fathy, M ; Sharif University of Technology
    Springer Science and Business Media Deutschland GmbH  2021
    Abstract
    Data variety is one of the most important features of Big Data. Data variety is the result of aggregating data from multiple sources and uneven distribution of data. This feature of Big Data causes high variation in the consumption of processing resources such as CPU consumption. This issue has been overlooked in previous works. To overcome the mentioned problem, in the present work, we used Dynamic Voltage and Frequency Scaling (DVFS) to reduce the energy consumption of computation. To this goal, we consider two types of deadlines as our constraint. Before applying the DVFS technique to computer nodes, we estimate the processing time and the frequency needed to meet the deadline. In the... 

    DV-DVFS: merging data variety and DVFS technique to manage the energy consumption of big data processing

    , Article Journal of Big Data ; Volume 8, Issue 1 , 2021 ; 21961115 (ISSN) Ahmadvand, H ; Foroutan, F ; Fathy, M ; Sharif University of Technology
    Springer Science and Business Media Deutschland GmbH  2021
    Abstract
    Data variety is one of the most important features of Big Data. Data variety is the result of aggregating data from multiple sources and uneven distribution of data. This feature of Big Data causes high variation in the consumption of processing resources such as CPU consumption. This issue has been overlooked in previous works. To overcome the mentioned problem, in the present work, we used Dynamic Voltage and Frequency Scaling (DVFS) to reduce the energy consumption of computation. To this goal, we consider two types of deadlines as our constraint. Before applying the DVFS technique to computer nodes, we estimate the processing time and the frequency needed to meet the deadline. In the... 

    P-V-L Deep: A big data analytics solution for now-casting in monetary policy

    , Article Journal of Information Technology Management ; Volume 12, Issue 4 , 2021 , Pages 22-62 ; 20085893 (ISSN) Sarduie, M. H ; Kazemi, M. A ; Alborzi, M ; Azar, A ; Kermanshah, A ; Sharif University of Technology
    University of Tehran  2021
    Abstract
    The development of new technologies has confronted the entire domain of science and industry with issues of big data's scalability as well as its integration with the purpose of forecasting analytics in its life cycle. In predictive analytics, the forecast of near-future and recent past - or in other words, the now-casting - is the continuous study of real-time events and constantly updated where it considers eventuality. So, it is necessary to consider the highly data-driven technologies and to use new methods of analysis, like machine learning and visualization tools, with the ability of interaction and connection to different data resources with varieties of data regarding the type of big... 

    Bingo spatial data prefetcher

    , Article 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019, 16 February 2019 through 20 February 2019 ; 2019 , Pages 399-411 ; 9781728114446 (ISBN) Bakhshalipour, M ; Shakerinava, M ; Lotfi Kamran, P ; Sarbazi Azad, H ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2019
    Abstract
    Applications extensively use data objects with a regular and fixed layout, which leads to the recurrence of access patterns over memory regions. Spatial data prefetching techniques exploit this phenomenon to prefetch future memory references and hide the long latency of DRAM accesses. While state-of-the-art spatial data prefetchers are effective at reducing the number of data misses, we observe that there is still significant room for improvement. To select an access pattern for prefetching, existing spatial prefetchers associate observed access patterns to either a short event with a high probability of recurrence or a long event with a low probability of recurrence. Consequently, the... 

    Network-aware Key Partitioner for Efficient MapReduce Computation

    , M.Sc. Thesis Sharif University of Technology Nasehi Basharzad, Saeed (Author) ; Goudarzi, Maziar (Supervisor)
    Abstract
    MapReduce and its open source implementation, Hadoop, are the prevailing platforms for big data processing. MapReduce is a simple programming model for performing large computational problems in large-scale distributed systems. This model consists of two major phases: Map and Reduce. Between these two main phases, partitioner part is embedded which distributes produced keys by Map tasks among Reduce tasks When the amount of keys and their associated values, which are called intermediate data, is huge, this part has significant impact on execution time of Reduce tasks, and consequently, completion time of jobs. In this paper, we present a network and resource aware key partitioner to decrease... 

    QoR-aware power capping for approximate big data processing

    , Article Proceedings of the 2018 Design, Automation and Test in Europe Conference and Exhibition, DATE 2018 ; Volume 2018-January , 19 April , 2018 , Pages 253-256 ; 9783981926316 (ISBN) Nabavinejad, S. M ; Zhan, X ; Azimi, R ; Goudarzi, M ; Reda, S ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2018
    Abstract
    To limit the peak power consumption of a cluster, a centralized power capping system typically assigns power caps to the individual servers, which are then enforced using local capping controllers. Consequently, the performance and throughput of the servers are affected, and the runtime of jobs is extended as a result. We observe that servers in big data processing clusters often execute big data applications that have different tolerance for approximate results. To mitigate the impact of power capping, we propose a new power-Capping aware resource manager for Approximate Big data processing (CAB) that takes into consideration the minimum Quality-of-Result (QoR) of the jobs. We use... 

    Estimating activity patterns using spatio-temporal data of cell phone networks

    , Article International Journal of Urban Sciences ; 2017 , Pages 1-18 ; 12265934 (ISSN) Zahedi, S ; Shafahi, Y ; Sharif University of Technology
    Abstract
    The tendency towards using activity-based models to predict trip demand has increased dramatically over recent years. However, these models have suffered from insufficient data for calibration, and the intrinsic problems of traditional methods impose the need to search for better alternatives. This paper discusses ways to process cell phone spatio-temporal data in a manner that makes it comprehensible for traffic interpretations and proposes methods on how to infer urban mobility and activity patterns from the aforementioned data. The movements of each subscriber are described by a sequence of stops and trips, and each stop is labelled by an activity. The types of activities are estimated... 

    Fast methods for recovering sparse parameters in linear low rank models

    , Article 2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016, 7 December 2016 through 9 December 2016 ; 2017 , Pages 1403-1407 ; 9781509045457 (ISBN) Esmaeili, A ; Amini, A ; Marvasti, F ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2017
    Abstract
    In this paper, we investigate the recovery of a sparse weight vector (parameters vector) from a set of noisy linear combinations. However, only partial information about the matrix representing the linear combinations is available. Assuming a low-rank structure for the matrix, one natural solution would be to first apply a matrix completion to the data, and then to solve the resulting compressed sensing problem. In big data applications such as massive MIMO and medical data, the matrix completion step imposes a huge computational burden. Here, we propose to reduce the computational cost of the completion task by ignoring the columns corresponding to zero elements in the sparse vector. To... 

    Estimating activity patterns using spatio-temporal data of cell phone networks

    , Article International Journal of Urban Sciences ; Volume 22, Issue 2 , 2018 , Pages 162-179 ; 12265934 (ISSN) Zahedi, S ; Shafahi, Y ; Sharif University of Technology
    Routledge  2018
    Abstract
    The tendency towards using activity-based models to predict trip demand has increased dramatically over recent years. However, these models have suffered from insufficient data for calibration, and the intrinsic problems of traditional methods impose the need to search for better alternatives. This paper discusses ways to process cell phone spatio-temporal data in a manner that makes it comprehensible for traffic interpretations and proposes methods on how to infer urban mobility and activity patterns from the aforementioned data. The movements of each subscriber are described by a sequence of stops and trips, and each stop is labelled by an activity. The types of activities are estimated... 

    Big (Bio)chemical data mining using chemometric methods: a need for chemists

    , Article Angewandte Chemie (International ed. in English) ; Volume 61, Issue 44 , 2022 , Pages e201801134- ; 15213773 (ISSN) Parastar, H ; Tauler, R ; Sharif University of Technology
    NLM (Medline)  2022
    Abstract
    This Review summarizes how big (bio)chemical data (BBCD) can be analyzed with multivariate chemometric methods and highlights some of the important challenges faced by modern analytical researches. Here, the potential of chemometric methods to solve BBCD problems that are being encountered in chromatographic, spectroscopic and hyperspectral imaging measurements will be discussed, with an emphasis on their applications to omics sciences. In addition, insights and perspectives on how to address the analysis of BBCD are provided along with a discussion of the procedures necessary to obtain more reliable qualitative and quantitative results. In this Review, the importance of "big data" and of... 

    DD-KARB: data-driven compliance to quality by rule based benchmarking

    , Article Journal of Big Data ; Volume 9, Issue 1 , 2022 ; 21961115 (ISSN) Besharati, M. R ; Izadi, M ; Sharif University of Technology
    Springer Science and Business Media Deutschland GmbH  2022
    Abstract
    The problem of compliance checking and assessment is to ensure that the design or implementation of a system meets some desired properties and complies with some rules or regularities. This problem is a key issue in several human and engineering application domains such as organizational management and e-governance, software and IT industries, and software and systems quality engineering. To deal with this problem, some different approaches and methods have been proposed. In addition to the approaches such as formal methods, mathematical proofs, and logical evaluations, benchmarking can be used for compliance assessment. Naturally, a set of benchmarks can shape an applied solution to... 

    Demand forecasting based machine learning algorithms on customer information: an applied approach

    , Article International Journal of Information Technology (Singapore) ; Volume 14, Issue 4 , 2022 , Pages 1937-1947 ; 25112104 (ISSN) Zohdi, M ; Rafiee, M ; Kayvanfar, V ; Salamiraad, A ; Sharif University of Technology
    Springer Science and Business Media B.V  2022
    Abstract
    Demand forecasting has always been a concern for business owners as one of the main activities in supply chain management. Unlike the past, that forecasting was done with the help of a limited amount of information, today, using advanced technologies and data analytics, forecasting is performed with machine learning algorithms and data-driven methods. Patterns and trends of demand, customer information, preferences, suggestions, and post-consumption feedbacks are some types of data that are used in various demand forecasting efforts. Traditional statistical methods and techniques are biased in demand prediction and are not accurate; so, machine learning algorithms as more popular techniques... 

    Accelerating Big Data Stream Processing by FPGA-implementation of Parts of the Topology Graph

    , M.Sc. Thesis Sharif University of Technology Kavand, Nima (Author) ; Goudarzi, Maziar (Supervisor)
    Abstract
    In recent years, big data processing plays an important role in the era of information technology. The exponential growth of big data volume increases the need for data centers and infrastructures with more processing power. Due to dark silicon and scalability limitations in deep-submicron, the increasing trend of server performance slows down. Therefore, hardware accelerators such as FPGA and GPU are become increasingly popular for improvement of data center processing power. There are two types of big data processing based on the application: stream processing and batch processing. With the widespread use of social networks, online control systems and internet of things services, the... 

    Comparing and Improving the Minimum Spanning Tree Algorithms in MapReduce

    , M.Sc. Thesis Sharif University of Technology Malek Abbasi, Mohammad Reza (Author) ; Ghodsi, Mohammad (Supervisor)
    Abstract
    In recent decades, we have faced the enormous growth of data and graph volumes. This requires modern ways of computation and storage systems and algorithms. MapReduce is a known way of processing Big Data in a Parallel and primarily Distributed setting. Theoretical models (e.g., Massively Parallel Computation) for Algorithms using this paradigm commonly evaluate the number of rounds and needed communication. We study the Minimum Spanning Tree (MST) as a fundamental graph problem. This problem in MapReduce is harder for sparse graphs. We introduce an algorithm that performs well comparing previous studies, especially for sparse graphs.We present an empirical study by implementing some... 

    Performance Modeling and Evaluation of MapReduce Applications

    , Ph.D. Dissertation Sharif University of Technology Karimian Aliabadi, Soroush (Author) ; Movaghar Rahimabadi, Ali (Supervisor) ; Entezari Maleki, Reza (Co-Supervisor)
    Abstract
    Businesses are dependent on mining of their Big Data more than ever and configuring clusters and frameworks to reach the best performance is still one of the challenges. An accurate performance prediction of the Big Data application helps reduce costs and SLA-violations with better tuning of the configuration parameters. Among the Big Data frameworks, Hadoop, Tez, and Apache Spark are the widely used and popular ones, with the MapReduce and graph-based workflows, usually running on top of the YARN cluster. While a great number of attempts have been made to predict the execution time of Big Data applications, to the best of our knowledge, none of them considered multiple simultaneous YARN... 

    Energy efficiency in cloud-based mapReduce applications through better performance estimation

    , Article Proceedings of the 2016 Design, Automation and Test in Europe Conference and Exhibition, DATE 2016, 14 March 2016 through 18 March 2016 ; 2016 , Pages 1339-1344 ; 9783981537062 (ISBN) Nabavinejad, S. M ; Goudarzi, M ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2016
    Abstract
    An important issue for efficient execution of MapReduce jobs on a cloud platform is selecting the best fitting virtual machine (VM) configuration(s) among the miscellany of choices that cloud providers offer. Wise selection of VM configurations can lead to better performance, cost and energy consumption. Therefore, it is crucial to explore the available configurations and choose the best one for each given MapReduce application. Executing the given application on all the configurations for comparison is a costly, time and energy consuming process. An alternative is to run the application on a subset of configurations (sample configurations) and estimate its performance on other... 

    HB2DS: a behavior-driven high-bandwidth network mining system

    , Article Journal of Systems and Software ; Volume 127 , 2017 , Pages 266-277 ; 01641212 (ISSN) Noferesti, M ; Jalili, R ; Sharif University of Technology
    Elsevier Inc  2017
    Abstract
    This paper proposes a behavior detection system, HB2DS, to address the behavior-detection challenges in high-bandwidth networks. In HB2DS, a summarization of network traffic is represented through some meta-events. The relationships amongst meta-events are used to mine end-user behaviors. HB2DS satisfies the main constraints exist in analyzing of high-bandwidth networks, namely online learning and outlier handling, as well as one-pass processing, delay, and memory limitations. Our evaluation indicates significant improvement in big data stream analyzing in terms of accuracy and efficiency. © 2016 Elsevier Inc  

    Profit maximization of big data jobs in cloud using stochastic optimization

    , Article IEEE Transactions on Cloud Computing ; Volume 9, Issue 4 , 2021 , Pages 1563-1574 ; 21687161 (ISSN) Nabavinejad, S. M ; Goudarzi, M ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2021
    Abstract
    Reserved instances offered by cloud providers make it possible to reserve resources and computing capacity for a specific period of time. One should pay for all the hours of that time interval; in exchange, the hourly rate is significantly lower than on-demand instances. Reserved Instances can significantly reduce the monetary cost of resources needed to process big data applications in cloud. However, purchases of these instances are non-refundable, and hence, one should be able to estimate the required resources prior to purchase to avoid over-payment. It becomes important especially when the results obtained by big data job has monetary value, such as business intelligence applications.... 

    Graph Clustering With Parallel Processing

    , M.Sc. Thesis Sharif University of Technology Bagha, Lila (Author) ; Daneshgar, Amir (Supervisor)
    Abstract
    One important way to discover patterns in data is clustering. Due to the limited human ability to solve problems in terms of volume and range of computing a viable solution for processing large amounts of data is using parallel processing. In this project, it is proposed some modification on Daneshgar-Javadi ‘s algorithm for data clustering using parallel processing. In this new algorithm , data will be divided among different processors to cluster them independently and in a parallel manner. Then results of each processor will be gathered and a final clustering will be performed on the gathered results. It is shown that the proposed modified algorithm along its high speed processing can...