Loading...
Search for: semi-supervised-learning
0.005 seconds
Total 49 records

    A hybrid supervised semi-supervised graph-based model to predict one-day ahead movement of global stock markets and commodity prices

    , Article Expert Systems with Applications ; Volume 105 , 2018 , Pages 159-173 ; 09574174 (ISSN) Negahdari Kia, A ; Haratizadeh, S ; Bagheri Shouraki, S ; Sharif University of Technology
    Abstract
    Market prediction has been an important machine learning research topic in recent decades. A neglected issue in prediction is having a model that can simultaneously pay attention to the interaction of global markets along historical data of the target markets being predicted. As a solution, we present a hybrid supervised semi-supervised model called HyS3 for direction of movement prediction. The graph-based semi-supervised part of HyS3 models the markets global interactions through a network designed with a novel continuous Kruskal-based graph construction algorithm called ConKruG. The supervised part of the model injects results extracted from each market's historical data to the network... 

    An efficient semi-supervised multi-label classifier capable of handling missing labels

    , Article IEEE Transactions on Knowledge and Data Engineering ; 2018 ; 10414347 (ISSN) Hosseini Akbarnejad, A ; Soleymani Baghshah, M ; Sharif University of Technology
    IEEE Computer Society  2018
    Abstract
    Multi-label classification has received considerable interest in recent years. Multi-label classifiers usually need to address many issues including: handling large-scale datasets with many instances and a large set of labels, compensating missing label assignments in the training set, considering correlations between labels, as well as exploiting unlabeled data to improve prediction performance. To tackle datasets with a large set of labels, embedding-based methods represent the label assignments in a low dimensional space. Many state-of-the-art embedding-based methods use a linear dimensionality reduction to map the label assignments to a low-dimensional space. However, by doing so, these... 

    Persian Statistical Natural Language Understanding Based on Partially Annotated Corpus

    , M.Sc. Thesis Sharif University of Technology Jabbari, Fattaneh (Author) ; Sameti, Hossein (Supervisor)
    Abstract
    Spoken language understanding unit is one of the most important parts of a spoken dialogue system. The input of this system is the output of speech recognition unit. The main function of this unit is to extract the semantic information from the input utterances. There are two main types of approaches to do this task: rule-based approaches, and data-driven approaches. Today data-driven approaches are of more interest because they are more flexible and robust compared to the rule-based approaches. The main drawback of these methods is that they need a large amount of fully annotated or in some cases Treebank data. Preparing such data is time consuming and expensive. The goal of this thesis is... 

    Data Labelling Using Manifold-Based Semi-Supervised Learning in Multispectral Remote Sensing

    , M.Sc. Thesis Sharif University of Technology Khajenezhad, Ahmad (Author) ; Rabiee, Hamid Reza (Supervisor) ; Safari, Mohammad Ali (Co-Advisor)
    Abstract
    Classification of hyperspectral remote sensing images is a challenging problem, because of the small number of labeled pixels, high dimensionality of the data and large number of pixels. In this context, semisupervised learning can improve the classification accuracy by extracting information form the distribution of all the labeled and unlabeled data. Among semi-supervised methods, manifold-based algorithms have been frequently used in recent years. In most of the previous works, manifolds are constructed according to spectral representation of data, while spatial dependency of pixel labels is an important property of the images in remote sensing applications. In this thesis, after... 

    Unsupervised Domain Adaptation via Representation Learning

    , M.Sc. Thesis Sharif University of Technology Gheisary, Marzieh (Author) ; Soleymani, Mahdieh (Supervisor)
    Abstract
    The existing learning methods usually assume that training and test data follow the same distribution, while this is not always true. Thus, in many cases the performance of these learning methods on the test data will be severely degraded. We often have sufficient labeled training data from a source domain but wish to learn a classifier which performs well on a target domain with a different distribution and no labeled training data. In this thesis, we study the problem of unsupervised domain adaptation, where no labeled data in the target domain is available. We propose a framework which finds a new representation for both the source and the target domain in which the distance between these... 

    Behavior-Driven Security Policy Enforcement on High Bandwidth Networks

    , Ph.D. Dissertation Sharif University of Technology Noferesti, Morteza (Author) ; Jalili, Rasool (Supervisor)
    Abstract
    High-bandwidth network analysis is challenging, resource consuming, and inaccurate due to the high volume, velocity, and variety characteristics of the network traffic. Today's high-bandwidth networks require adaptive analyzing approaches to recognize the network variable behaviors. The analyzing approaches should be robust against the lack of prior knowledge and provide data to impose more complex policies.This thesis introduces complex policy relation and proposes a two-layer framework to enforce complex policies, named HB2DS. The proposed framework is equipped with the mechanism and policy layers. The mechanism layer processes network packets header and payload to generate a flow stream.... 

    Semi-Supervised Kernel Learning for Pattern Classification

    , Ph.D. Dissertation Sharif University of Technology Rohban, Mohammad Hossein (Author) ; Rabiee, Hamid Reza (Supervisor)
    Abstract
    Supervised kernel learning has been the focus of research in recent years. Although these methods are developed based on rigorous frameworks, they fail to improve the classification accuracy in real world applications. In order to find the origin of this problem, it should be noted that the kernel function represents a prior knowledge on the labeling function. Similar to other learning problem, learning this prior knowledge needs another prior knowledge. In supervised kernel learning, only naive assumptions can be used as the prior knowledge. These include minimizing the ℓ1 and ℓ2 norms of the kernel parameters.
    As an alternative approach, in Semi-Supervised Learning (SSL), unlabeled... 

    Information Retrieval from Incomplete Observations

    , Ph.D. Dissertation Sharif University of Technology Esmaeili, Ashkan (Author) ; Marvasti, Farokh (Supervisor)
    Abstract
    In this dissertation, Data analysis and information retrieval from incomplete observations are investigated in different applications. Incomplete observations may be induced by lack of observations or part of data affected by specific noise (quantization noise). Data-driven algorithms are among important hot topics. Our goal is to process the lost information inducing certain assumption on big data structures. Then, the approach is to mathematically model the problem of interest as an optimization problem. Next, the designed algorithms for the optimization problems are proposed trying to cut down on the computational complexity of as well as enhancing recovery accuracy for big data... 

    Online Distance Metric Learning

    , M.Sc. Thesis Sharif University of Technology Vazifedan, Afrooz (Author) ; Beigy, Hamid (Supervisor)
    Abstract
    Distance Metric Learning algorithms have been widely used in Machine Learning methods recently. In these algorithms a distance function between objecs (data points) is learned based on their labels or similarity and dissimilarity constraints. Recent works have shown that a good precision is obtained in classification or clustering methods which use these functions. Since in the current systems many of data points do not exist at the beginning and are added to the training set as the algorithm is run, online methods are needed to update learned metric due to new data.
    In this thesis, we proposed a new online distance metric learning method that has higher performance than existing... 

    Identification and Forecasting of Nuclear Power Plants Transients by Semi-Supervised Method with Change of Representation Technique

    , M.Sc. Thesis Sharif University of Technology Mirzaei Dam-Abi, Ali (Author) ; Ghofrani, Mohamad Bagher (Supervisor) ; Moshkbar Bakhshayesh, Khalil (Supervisor)
    Abstract
    In this work, we aim to find a way to identify and forecast transients in nuclear power plants with the aid of semi-supervised machine learning algorithm. Forecasting and identifying transients in nuclear power plants at the early stages of formation are essential for safety considerations and precautionary measures. The use of machine learning algorithms provides an intelligent control mechanism that, along with the main operator of the power plant, raises the transient detection and identification rate. Our algorithm of choice is to change the way data is presented, which is a semi-supervised learning approach. The algorithm consists of two methods: quantum dynamics clustering... 

    Continual Learning Using Unsupervised Data

    , M.Sc. Thesis Sharif University of Technology Ameli Kalkhoran, Amir Hossein (Author) ; Soleymani Baghshah, Mahdieh (Supervisor)
    Abstract
    The existing continual learning methods are mainly focused on fully-supervised scenarios and are still not able to take advantage of unlabeled data available in the environment. Some recent works tried to investigate semi-supervised continual learning (SSCL) settings in which the unlabeled data are available, but it is only from the same distribution as the labeled data. This assumption is still not general enough for real-world applications and restricts the utilization of unsupervised data. In this work, we introduce Open-Set Semi-Supervised Continual Learning (OSSCL), a more realistic semi-supervised continual learning setting in which out-of-distribution (OoD) unlabeled samples in the... 

    Image Annotation Using Semi-supervised Learning

    , Ph.D. Dissertation Sharif University of Technology Amiri, Hamid (Author) ; Jamzad, Mansour (Supervisor)
    Abstract
    Aautomatic image annotation that assigns some labels to input images and provides a textual description for the contents of images has become an active field in machine vision community. To design an annotation system, we need a dataset that contains images and labels for them. However, a large amount of manual efforts is required to annotate all images in a dataset. To reduce the demand of annotation systems on the labeled images, one solution is to exploit useful information embedded into the unlabeled images and incorporate them into learning process. In machine learning community, semi-supervised learning (SSL) has been introduced with the aim of incorporating unlabeled samples into the... 

    Context-based Persian Grapheme-to-Phoneme Conversion using Sequence-to-Sequence Models

    , M.Sc. Thesis Sharif University of Technology Rahmati, Elnaz (Author) ; Sameti, Hossein (Supervisor)
    Abstract
    Many Text-to-Speech (TTS) systems, particularly in low-resource environments, struggle to produce natural and intelligible speech from grapheme sequences. One solution to this problem is to use Grapheme-to-Phoneme (G2P) conversion to increase the information in the input sequence and improve the TTS output. However, current G2P systems are not accurate or efficient enough for Persian texts due to the language’s complexity and the lack of short vowels in Persian grapheme sequences. In our study, we aimed to improve resources for the Persian language. To achieve this, we introduced two new G2P training datasets, one manually-labeled and the other machine-generated, containing over five million... 

    3D Medical Images Segmentation by Effective Use of Unlabeled Data

    , M.Sc. Thesis Sharif University of Technology Khalili, Hossein (Author) ; Soleymani Baghshah, Mahdieh (Supervisor)
    Abstract
    Image segmentation in medical imaging, as one of the most important branches of medical image analysis, often faces the challenge of limited labeled data for application in deep learning methods. The high cost of data collection and the need for expertise in image segmentation, particularly in three-dimensional images such as MRI and CT or sequence images like CMR, have all contributed to this problem, even for popular networks like U-Net, which struggle to achieve high accuracy. As a result, research efforts have focused on semi-supervised learning approaches, weakly supervised learning, as well as multi-instance learning in medical image segmentation. Unfortunately, each of these methods... 

    Deep Zero-shot Learning

    , M.Sc. Thesis Sharif University of Technology Shojaee, Mohsen (Author) ; Soleymani, Mahdieh (Supervisor)
    Abstract
    In some of object recognition problems, labeled data may not be available for all categories. Zero-shot learning utilizes auxiliary information (also called signatures) describing each category in order to find a classifier that can recognize samples from categories with no labeled instance. On the other hand, with recent advances made by deep neural networks in computer vision, a rich representation can be obtained from images that discriminates different categorizes and therefore obtaining a unsupervised information from images is made possible. However, in the previous works, little attention has been paid to using such unsupervised information for the task of zero-shot learning. In this... 

    Deep Semi-Supervised Text Classification

    , M.Sc. Thesis Sharif University of Technology Karimi, Ali (Author) ; Semati, Hossein (Supervisor)
    Abstract
    Large data sources labeled by experts at cost are essential for deep learning success in various domains. But, when labeling is expensive and labeled data is scarce, deep learning generally does not perform well. The goal of semi-supervised learning is to leverage abundant unlabeled data that one can easily collect. New semi-supervised algorithms based on data augmentation techniques have reached new advances in this field. In this work, by studying different textual augmentation techniques, a new approach is proposed that can obtain effective information signals from unlabeled data. The method encourages the model to generate the same representation vectors for different augmented versions... 

    Exploiting multiview properties in semi-supervised video classification

    , Article 2012 6th International Symposium on Telecommunications, IST 2012 ; 2012 , Pages 837-842 ; 9781467320733 (ISBN) Karimian, M ; Tavassolipour, M ; Kasaei, S ; Sharif University of Technology
    Abstract
    In large databases, availability of labeled training data is mostly prohibitive in classification. Semi-supervised algorithms are employed to tackle the lack of labeled training data problem. Video databases are the epitome for such a scenario; that is why semi-supervised learning has found its niche in it. Graph-based methods are a promising platform for semi-supervised video classification. Based on the multiview characteristic of video data, different features have been proposed (such as SIFT, STIP and MFCC) which can be utilized to build a graph. In this paper, we have proposed a new classification method which fuses the results of manifold regularization over different graphs. Our... 

    Isograph: Neighbourhood graph construction based on geodesic distance for semi-supervised learning

    , Article Proceedings - IEEE International Conference on Data Mining, ICDM, 11 December 2011 through 14 December 2011 ; December , 2011 , Pages 191-200 ; 15504786 (ISSN) ; 9780769544083 (ISBN) Ghazvininejad, M ; Mahdieh, M ; Rabiee, H. R ; Roshan, P. K ; Rohban, M. H ; Sharif University of Technology
    2011
    Abstract
    Semi-supervised learning based on manifolds has been the focus of extensive research in recent years. Convenient neighbourhood graph construction is a key component of a successful semi-supervised classification method. Previous graph construction methods fail when there are pairs of data points that have small Euclidean distance, but are far apart over the manifold. To overcome this problem, we start with an arbitrary neighbourhood graph and iteratively update the edge weights by using the estimates of the geodesic distances between points. Moreover, we provide theoretical bounds on the values of estimated geodesic distances. Experimental results on real-world data show significant... 

    Active learning from positive and unlabeled data

    , Article Proceedings - IEEE International Conference on Data Mining, ICDM, 11 December 2011 through 11 December 2011 ; December , 2011 , Pages 244-250 ; 15504786 (ISSN) ; 9780769544090 (ISBN) Ghasemi, A ; Rabiee, H. R ; Fadaee, M ; Manzuri, M. T ; Rohban, M. H ; Sharif University of Technology
    2011
    Abstract
    During recent years, active learning has evolved into a popular paradigm for utilizing user's feedback to improve accuracy of learning algorithms. Active learning works by selecting the most informative sample among unlabeled data and querying the label of that point from user. Many different methods such as uncertainty sampling and minimum risk sampling have been utilized to select the most informative sample in active learning. Although many active learning algorithms have been proposed so far, most of them work with binary or multi-class classification problems and therefore can not be applied to problems in which only samples from one class as well as a set of unlabeled data are... 

    Combining Supervised and Semi-Supervised Learning in the Design of a New Identifier for NPPs Transients

    , Article IEEE Transactions on Nuclear Science ; Volume 63, Issue 3 , 2016 , Pages 1882-1888 ; 00189499 (ISSN) Moshkbar Bakhshayesh, K ; Ghofrani, M. B ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2016
    Abstract
    This study introduces a new identifier for nuclear power plants (NPPs) transients. The proposed identifier performs its function in two steps. First, the transient is identified by the previously developed supervised classifier combining ARIMA model and EBP algorithm. In the second step, the patterns of unknown transients are fed to the identifier based on the semi-supervised learning (SSL). The transductive support vector machine (TSVM) as a semi-supervised algorithm is trained by the labeled data of transients to predict some unlabeled data. The labeled and newly predicted data is then used to train the TSVM for another portion of unlabeled data. Training and prediction is continued until...