Loading...
Search for: big-data
0.009 seconds
Total 82 records

    Big (Bio)chemical data mining using chemometric methods: a need for chemists

    , Article Angewandte Chemie (International ed. in English) ; Volume 61, Issue 44 , 2022 , Pages e201801134- ; 15213773 (ISSN) Parastar, H ; Tauler, R ; Sharif University of Technology
    NLM (Medline)  2022
    Abstract
    This Review summarizes how big (bio)chemical data (BBCD) can be analyzed with multivariate chemometric methods and highlights some of the important challenges faced by modern analytical researches. Here, the potential of chemometric methods to solve BBCD problems that are being encountered in chromatographic, spectroscopic and hyperspectral imaging measurements will be discussed, with an emphasis on their applications to omics sciences. In addition, insights and perspectives on how to address the analysis of BBCD are provided along with a discussion of the procedures necessary to obtain more reliable qualitative and quantitative results. In this Review, the importance of "big data" and of... 

    DD-KARB: data-driven compliance to quality by rule based benchmarking

    , Article Journal of Big Data ; Volume 9, Issue 1 , 2022 ; 21961115 (ISSN) Besharati, M. R ; Izadi, M ; Sharif University of Technology
    Springer Science and Business Media Deutschland GmbH  2022
    Abstract
    The problem of compliance checking and assessment is to ensure that the design or implementation of a system meets some desired properties and complies with some rules or regularities. This problem is a key issue in several human and engineering application domains such as organizational management and e-governance, software and IT industries, and software and systems quality engineering. To deal with this problem, some different approaches and methods have been proposed. In addition to the approaches such as formal methods, mathematical proofs, and logical evaluations, benchmarking can be used for compliance assessment. Naturally, a set of benchmarks can shape an applied solution to... 

    Some natural hypomethylating agents in food, water and environment are against distribution and risks of COVID-19 pandemic: Results of a big-data research

    , Article Avicenna Journal of Phytomedicine ; Volume 12, Issue 3 , 2022 , Pages 309-324 ; 22287930 (ISSN) Besharati, M. R ; Izadi, M ; Talebpour, A ; Sharif University of Technology
    Mashhad University of Medical Sciences  2022
    Abstract
    Objective: This study analyzes the effects of lifestyle, nutrition, and diets on the status and risks of apparent (symptomatic) COVID-19 infection in Iranian families. Materials and Methods: A relatively extensive questionnaire survey was conducted on more than 20,000 Iranian families (residing in more than 1000 different urban and rural areas in the Islamic Republic of Iran) to collect the big data of COVID-19 and develop a lifestyle dataset. The collected big data included the records of lifestyle effects (e.g. nutrition, water consumption resources, physical exercise, smoking, age, gender, health and disease factors, etc.) on the status of COVID-19 infection in families (i.e. residents of... 

    Demand forecasting based machine learning algorithms on customer information: an applied approach

    , Article International Journal of Information Technology (Singapore) ; Volume 14, Issue 4 , 2022 , Pages 1937-1947 ; 25112104 (ISSN) Zohdi, M ; Rafiee, M ; Kayvanfar, V ; Salamiraad, A ; Sharif University of Technology
    Springer Science and Business Media B.V  2022
    Abstract
    Demand forecasting has always been a concern for business owners as one of the main activities in supply chain management. Unlike the past, that forecasting was done with the help of a limited amount of information, today, using advanced technologies and data analytics, forecasting is performed with machine learning algorithms and data-driven methods. Patterns and trends of demand, customer information, preferences, suggestions, and post-consumption feedbacks are some types of data that are used in various demand forecasting efforts. Traditional statistical methods and techniques are biased in demand prediction and are not accurate; so, machine learning algorithms as more popular techniques... 

    Traffic pattern detection using topic modeling for speed cameras based on big data abstraction

    , Article Transportation Letters ; Volume 14, Issue 4 , 2022 , Pages 339-346 ; 19427867 (ISSN) Gholampour, I ; Mirzahossein, H ; Chiu, Y. C ; Sharif University of Technology
    Taylor and Francis Ltd  2022
    Abstract
    The importance of traffic pattern prediction for traffic management systems has significantly increased in recent years. This paper presents a novel method to find unusual traffic patterns by using topic modeling. We have employed topic models to provide an abstraction of speed camera data from Tehran, the capital of Iran. In this methodology, topic modeling is applied to days of weeks and months in a year and extracts weekly and monthly traffic patterns. Analysis of the abstract descriptions and their adaptation to actual urban traffic patterns prove the effectiveness of the proposed method. The model training convergence is also practically verified. Based on our experiments, our method... 

    COVID and nutrition: A machine learning perspective

    , Article Informatics in Medicine Unlocked ; Volume 28 , 2022 ; 23529148 (ISSN) Jafari, N ; Besharati, M. R ; Izadi, M ; Talebpour, A ; Sharif University of Technology
    Elsevier Ltd  2022
    Abstract
    A self-report questionnaire survey was conducted online to collect big data from over 16000 Iranian families (who were the residents of 1000 urban and rural areas of Iran). The resulting data storage contained over 1 M records of data and over 1G records of automatically inferred information. Based on this data storage, a series of machine learning experiments was conducted to investigate the relationship between nutrition and the risk of contracting COVID-19. With highly accurate scores, the findings strongly suggest that foods and water sources containing certain natural bioactive and phytochemical agents may help to reduce the risk of apparent COVID-19 infection. © 2022 The Author(s)  

    Fixed-point iteration approach to spark scalable performance modeling and evaluation

    , Article IEEE Transactions on Cloud Computing ; 2021 ; 21687161 (ISSN) Karimian Aliabadi, S ; Aseman Manzar, M ; Entezari Maleki, R ; Ardagna, D ; Egger, B ; Movaghar, A ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2021
    Abstract
    Companies depend on mining data to grow their business more than ever. To achieve optimal performance of Big Data analytics workloads, a careful configuration of the cluster and the employed software framework is required. The lack of flexible and accurate performance models, however, render this a challenging task. This paper fills this gap by presenting accurate performance prediction models based on Stochastic Activity Networks (SANs). In contrast to existing work, the presented models consider multiple work queues, a critical feature to achieve high accuracy in realistic usage scenarios. We first introduce a monolithic analytical model for a multi-queue YARN cluster running DAG-based Big... 

    Profit maximization of big data jobs in cloud using stochastic optimization

    , Article IEEE Transactions on Cloud Computing ; Volume 9, Issue 4 , 2021 , Pages 1563-1574 ; 21687161 (ISSN) Nabavinejad, S. M ; Goudarzi, M ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2021
    Abstract
    Reserved instances offered by cloud providers make it possible to reserve resources and computing capacity for a specific period of time. One should pay for all the hours of that time interval; in exchange, the hourly rate is significantly lower than on-demand instances. Reserved Instances can significantly reduce the monetary cost of resources needed to process big data applications in cloud. However, purchases of these instances are non-refundable, and hence, one should be able to estimate the required resources prior to purchase to avoid over-payment. It becomes important especially when the results obtained by big data job has monetary value, such as business intelligence applications.... 

    P-V-L Deep: A big data analytics solution for now-casting in monetary policy

    , Article Journal of Information Technology Management ; Volume 12, Issue 4 , 2021 , Pages 22-62 ; 20085893 (ISSN) Sarduie, M. H ; Kazemi, M. A ; Alborzi, M ; Azar, A ; Kermanshah, A ; Sharif University of Technology
    University of Tehran  2021
    Abstract
    The development of new technologies has confronted the entire domain of science and industry with issues of big data's scalability as well as its integration with the purpose of forecasting analytics in its life cycle. In predictive analytics, the forecast of near-future and recent past - or in other words, the now-casting - is the continuous study of real-time events and constantly updated where it considers eventuality. So, it is necessary to consider the highly data-driven technologies and to use new methods of analysis, like machine learning and visualization tools, with the ability of interaction and connection to different data resources with varieties of data regarding the type of big... 

    DV-DVFS: merging data variety and DVFS technique to manage the energy consumption of big data processing

    , Article Journal of Big Data ; Volume 8, Issue 1 , 2021 ; 21961115 (ISSN) Ahmadvand, H ; Foroutan, F ; Fathy, M ; Sharif University of Technology
    Springer Science and Business Media Deutschland GmbH  2021
    Abstract
    Data variety is one of the most important features of Big Data. Data variety is the result of aggregating data from multiple sources and uneven distribution of data. This feature of Big Data causes high variation in the consumption of processing resources such as CPU consumption. This issue has been overlooked in previous works. To overcome the mentioned problem, in the present work, we used Dynamic Voltage and Frequency Scaling (DVFS) to reduce the energy consumption of computation. To this goal, we consider two types of deadlines as our constraint. Before applying the DVFS technique to computer nodes, we estimate the processing time and the frequency needed to meet the deadline. In the... 

    DV-DVFS: merging data variety and DVFS technique to manage the energy consumption of big data processing

    , Article Journal of Big Data ; Volume 8, Issue 1 , 2021 ; 21961115 (ISSN) Ahmadvand, H ; Foroutan, F ; Fathy, M ; Sharif University of Technology
    Springer Science and Business Media Deutschland GmbH  2021
    Abstract
    Data variety is one of the most important features of Big Data. Data variety is the result of aggregating data from multiple sources and uneven distribution of data. This feature of Big Data causes high variation in the consumption of processing resources such as CPU consumption. This issue has been overlooked in previous works. To overcome the mentioned problem, in the present work, we used Dynamic Voltage and Frequency Scaling (DVFS) to reduce the energy consumption of computation. To this goal, we consider two types of deadlines as our constraint. Before applying the DVFS technique to computer nodes, we estimate the processing time and the frequency needed to meet the deadline. In the... 

    Traffic pattern detection using topic modeling for speed cameras based on big data abstraction

    , Article Transportation Letters ; 2020 Gholampour, I ; Mirzahossein, H ; Chiu, Y. C ; Sharif University of Technology
    Taylor and Francis Ltd  2020
    Abstract
    The importance of traffic pattern prediction for traffic management systems has significantly increased in recent years. This paper presents a novel method to find unusual traffic patterns by using topic modeling. We have employed topic models to provide an abstraction of speed camera data from Tehran, the capital of Iran. In this methodology, topic modeling is applied to days of weeks and months in a year and extracts weekly and monthly traffic patterns. Analysis of the abstract descriptions and their adaptation to actual urban traffic patterns prove the effectiveness of the proposed method. The model training convergence is also practically verified. Based on our experiments, our method... 

    Green space and happiness of developed countries

    , Article 2020 IEEE International Conference on Big Data and Smart Computing, BigComp 2020, 19 February 2020 through 22 February 2020 ; 2020 , Pages 247-250 Hashemi Fesharaki, S. F ; Behrouz, A ; Yang, J ; Wohn, D. Y ; Cha, M ; IEEE; Korean Institute of Information Scientists and Engineers (KIISE) ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2020
    Abstract
    Previous research has reported a connection between urban green space and public health that ultimately contributes to happiness. Existing studies have mainly investigated green space over small areas. This paper revisits this significant correlation by examining the relationship between country-level happiness and the amount of urban green space as measured systematically from satellite images. Based on 2018 and 2013 data from 30 developed countries, we found that there is a correlation between urban green space and happiness, and this relationship becomes stronger among countries with higher GDP. We also found that the relationship between happiness and green space has grown stronger over... 

    Minimizing data access latencies for virtual machine assignment in cloud systems

    , Article IEEE Transactions on Services Computing ; Volume 13, Issue 5 , August , 2020 , Pages 857-870 Malekimajd, M ; Movaghar, A ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2020
    Abstract
    Cloud systems empower the big data management by providing virtual machines (VMs) to process data nodes (DNs) in a faster, cheaper and more effective way. The efficiency of a VM allocation is an important concern that is influenced by the communication latencies. In the literature, it has been proved that the VM assignment minimizing communication latency in the presence of the triangle inequality is 2-approximation. However, a 2-approximation solution is not efficient enough as data center networks are not limited to the triangle inequality. In this paper, we define the quadrilateral inequality property for latencies such that the time complexity of the VM assignment problem minimizing... 

    Bandwidth on-demand for multimedia big data transfer across geo-distributed cloud data centers

    , Article IEEE Transactions on Cloud Computing ; Volume 8, Issue 4 , 2020 , Pages 1189-1198 Yassine, A ; Nazari Shirehjini, A. A ; Shirmohammadi, S ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2020
    Abstract
    Multimedia content is massively generated from various applications and devices, and processed in cloud data centers. Multimedia service providers prefer that their data are processed in data centers close to users in order to offer them high performance and reliable multimedia services that meet the requirements specified in the Service Level of Agreement (SLA). This requires transferring huge data sets of video streams, games content, images etc. across geographically distributed cloud data centers using underutilized bandwidth in backbone transport networks. As the amount of multimedia content increases, the demand to transfer big data sets across data centers increases as well. As such,... 

    Analytical composite performance models for Big Data applications

    , Article Journal of Network and Computer Applications ; Volume 142 , 2019 , Pages 63-75 ; 10848045 (ISSN) Karimian Aliabadi, S ; Ardagna, D ; Entezari Maleki, R ; Gianniti, E ; Movaghar, A ; Sharif University of Technology
    Academic Press  2019
    Abstract
    Recent years witnessed a steep rise in data generation and, consequently, the widespread adoption of software solutions able to support data-intensive applications. Many companies currently engage in data-intensive processes, however, fully embracing a data-driven paradigm is still cumbersome, and establishing a production-ready and fine-tuned deployment is time-consuming. This situation calls for innovative models and techniques to streamline the process of deployment configuration for Big Data applications. Moreover, many companies are using Cloud deployed clusters, which represent a cost-effective alternative to installation on premises. Accurate and fast prediction of the execution time... 

    SAIR: significance-aware approach to improve QoR of big data processing in case of budget constraint

    , Article Journal of Supercomputing ; Volume 75, Issue 9 , 2019 , Pages 5760-5781 ; 09208542 (ISSN) Ahmadvand, H ; Goudarzi, M ; Sharif University of Technology
    Springer New York LLC  2019
    Abstract
    Nowadays, a wide range of enterprises are faced with big data processing in different domains such as transaction operations, business calculations and analytical computations. Large-scale computing is an approach for big data processing. Due to the cost of large-scale computing and limitations of enterprise budgets, it is hardly possible to process all the input data and therefore the Quality of Result (QoR) may be affected. SAIR is an approach to improve QoR of big data processing for aggregative usages based on significance variety when there is a budget constraint. In this paper, the most significant data portions have been assigned to the most efficient resources in terms of time and... 

    Mitigating the performance and quality of parallelized compressive sensing reconstruction using image stitching

    , Article 29th Great Lakes Symposium on VLSI, GLSVLSI 2019, 9 May 2019 through 11 May 2019 ; 2019 , Pages 219-224 ; 9781450362528 (ISBN) Namazi, M ; Mohammadi Makrani, H ; Tian, Z ; Rafatirad, S ; Akbari, M. H ; Sasan, A ; Homayoun, H ; ACM Special Interest Group on Design Automation (SIGDA) ; Sharif University of Technology
    Association for Computing Machinery  2019
    Abstract
    Orthogonal Matching Pursuit is an iterative greedy algorithm used to find a sparse approximation for high-dimensional signals. The algorithm is most popularly used in Compressive Sensing, which allows for the reconstruction of sparse signals at rates lower than the Shannon-Nyquist frequency, which has traditionally been used in a number of applications such as MRI and computer vision and is increasingly finding its way into Big Data and data center analytics. OMP traditionally suffers from being computationally intensive and time-consuming, this is particularly a problem in the area of Big Data where the demand for computational resources continues to grow. In this paper, the data-level... 

    Bingo spatial data prefetcher

    , Article 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019, 16 February 2019 through 20 February 2019 ; 2019 , Pages 399-411 ; 9781728114446 (ISBN) Bakhshalipour, M ; Shakerinava, M ; Lotfi Kamran, P ; Sarbazi Azad, H ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2019
    Abstract
    Applications extensively use data objects with a regular and fixed layout, which leads to the recurrence of access patterns over memory regions. Spatial data prefetching techniques exploit this phenomenon to prefetch future memory references and hide the long latency of DRAM accesses. While state-of-the-art spatial data prefetchers are effective at reducing the number of data misses, we observe that there is still significant room for improvement. To select an access pattern for prefetching, existing spatial prefetchers associate observed access patterns to either a short event with a high probability of recurrence or a long event with a low probability of recurrence. Consequently, the... 

    IMOS: improved meta-aligner and minimap2 on spark

    , Article BMC Bioinformatics ; Volume 20, Issue 1 , 2019 ; 14712105 (ISSN) Hadadian Nejad Yousefi, M ; Goudarzi, M ; Motahari, A ; Sharif University of Technology
    BioMed Central Ltd  2019
    Abstract
    Background: Long reads provide valuable information regarding the sequence composition of genomes. Long reads are usually very noisy which renders their alignments on the reference genome a daunting task. It may take days to process datasets enough to sequence a human genome on a single node. Hence, it is of primary importance to have an aligner which can operate on distributed clusters of computers with high performance in accuracy and speed. Results: In this paper, we presented IMOS, an aligner for mapping noisy long reads to the reference genome. It can be used on a single node as well as on distributed nodes. In its single-node mode, IMOS is an Improved version of Meta-aligner (IM)...