Sharif Digital Repository / Sharif University of Technology / Search result

Exploiting multiview properties in semi-supervised video classification

, Article 2012 6th International Symposium on Telecommunications, IST 2012 ; 2012 , Pages 837-842 ; 9781467320733 (ISBN) Karimian, M ; Tavassolipour, M ; Kasaei, S ; Sharif University of Technology

Abstract

In large databases, availability of labeled training data is mostly prohibitive in classification. Semi-supervised algorithms are employed to tackle the lack of labeled training data problem. Video databases are the epitome for such a scenario; that is why semi-supervised learning has found its niche in it. Graph-based methods are a promising platform for semi-supervised video classification. Based on the multiview characteristic of video data, different features have been proposed (such as SIFT, STIP and MFCC) which can be utilized to build a graph. In this paper, we have proposed a new classification method which fuses the results of manifold regularization over different graphs. Our...

Supervised neighborhood graph construction for semi-supervised classification

, Article Pattern Recognition ; Volume 45, Issue 4 , April , 2012 , Pages 1363-1372 ; 00313203 (ISSN) Rohban, M. H ; Rabiee, H. R ; Sharif University of Technology

Abstract

Graph based methods are among the most active and applicable approaches studied in semi-supervised learning. The problem of neighborhood graph construction for these methods is addressed in this paper. Neighborhood graph construction plays a key role in the quality of the classification in graph based methods. Several unsupervised graph construction methods have been proposed that have addressed issues such as data noise, geometrical properties of the underlying manifold and graph hyper-parameters selection. In contrast, in order to adapt the graph construction to the given classification task, many of the recent graph construction methods take advantage of the data labels. However, these...

Semi-supervised ensemble learning of data streams in the presence of concept drift

, Article Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) ; Volume 7209 LNAI, Issue PART 2 , 2012 , Pages 526-537 ; 03029743 (ISSN) ; 9783642289309 (ISBN) Ahmadi, Z ; Beigy, H ; Sharif University of Technology

Abstract

Increasing access to very large and non-stationary datasets in many real problems has made the classical data mining algorithms impractical and made it necessary to design new online classification algorithms. Online learning of data streams has some important features, such as sequential access to the data, limitation on time and space complexity and the occurrence of concept drift. The infinite nature of data streams makes it hard to label all observed instances. It seems that using the semi-supervised approaches have much more compatibility with the problem. So in this paper we present a new semi-supervised ensemble learning algorithm for data streams. This algorithm uses the majority...

Unilateral semi-supervised learning of extended hidden vector state for Persian language understanding

, Article NLP-KE 2011 - Proceedings of the 7th International Conference on Natural Language Processing and Knowledge Engineering, 27 November 2011 through 29 November 2011, Tokushima ; 2011 , Pages 165-168 ; 9781612847283 (ISBN) Jabbari, F ; Sameti, H ; Bokaei, M. H ; Chinese Association for Artificial Intelligence; IEEE Signal Processing Society ; Sharif University of Technology

2011

Abstract

The key element of a spoken dialogue system is Spoken Language Understanding (SLU) part. HVS and EHVS are two most popular statistical methods employed to implement the SLU part which need lightly annotated data. Since annotation is a time consuming, we present a novel semi-supervised learning for EHVS to reduce the human labeling effort using two different statistical classifiers, SVM and KNN. Experiments are done on a Persian corpus, the University Information Kiosk corpus. The experimental results show improvements in performance of semi-supervised EHVS, trained by both labeled and unlabeled data, compared to EHVS trained by just initially labeled data. The performance of EHVS improves...

Efficient iterative Semi-Supervised Classification on manifold

, Article Proceedings - IEEE International Conference on Data Mining, ICDM ; 2011 , Pages 228-235 ; 15504786 (ISSN); 9780769544090 (ISBN) Farajtabar, M ; Rabiee, H. R ; Shaban, A ; Soltani Farani, A ; National Science Foundation (NSF) - Where Discoveries Begin; University of Technology Sydney; Google; Alberta Ingenuity Centre for Machine Learning; IBM Research ; Sharif University of Technology

Abstract

Semi-Supervised Learning (SSL) has become a topic of recent research that effectively addresses the problem of limited labeled data. Many SSL methods have been developed based on the manifold assumption, among them, the Local and Global Consistency (LGC) is a popular method. The problem with most of these algorithms, and in particular with LGC, is the fact that their naive implementations do not scale well to the size of data. Time and memory limitations are the major problems faced in large-scale problems. In this paper, we provide theoretical bounds on gradient descent, and to overcome the aforementioned problems, a new approximate Newton's method is proposed. Moreover, convergence...

Isograph: Neighbourhood graph construction based on geodesic distance for semi-supervised learning

, Article Proceedings - IEEE International Conference on Data Mining, ICDM, 11 December 2011 through 14 December 2011 ; December , 2011 , Pages 191-200 ; 15504786 (ISSN) ; 9780769544083 (ISBN) Ghazvininejad, M ; Mahdieh, M ; Rabiee, H. R ; Roshan, P. K ; Rohban, M. H ; Sharif University of Technology

2011

Abstract

Semi-supervised learning based on manifolds has been the focus of extensive research in recent years. Convenient neighbourhood graph construction is a key component of a successful semi-supervised classification method. Previous graph construction methods fail when there are pairs of data points that have small Euclidean distance, but are far apart over the manifold. To overcome this problem, we start with an arbitrary neighbourhood graph and iteratively update the edge weights by using the estimates of the geodesic distances between points. Moreover, we provide theoretical bounds on the values of estimated geodesic distances. Experimental results on real-world data show significant...

Active learning from positive and unlabeled data

, Article Proceedings - IEEE International Conference on Data Mining, ICDM, 11 December 2011 through 11 December 2011 ; December , 2011 , Pages 244-250 ; 15504786 (ISSN) ; 9780769544090 (ISBN) Ghasemi, A ; Rabiee, H. R ; Fadaee, M ; Manzuri, M. T ; Rohban, M. H ; Sharif University of Technology

2011

Abstract

During recent years, active learning has evolved into a popular paradigm for utilizing user's feedback to improve accuracy of learning algorithms. Active learning works by selecting the most informative sample among unlabeled data and querying the label of that point from user. Many different methods such as uncertainty sampling and minimum risk sampling have been utilized to select the most informative sample in active learning. Although many active learning algorithms have been proposed so far, most of them work with binary or multi-class classification problems and therefore can not be applied to problems in which only samples from one class as well as a set of unlabeled data are...

HMM based semi-supervised learning for activity recognition

, Article SAGAware'11 - Proceedings of the 2011 International Workshop on Situation Activity and Goal Awareness, 18 September 2011 through 18 September 2011, Beijing ; September , 2011 , Pages 95-99 ; 9781450309264 (ISBN) Ghazvininejad, M ; Rabiee, H. R ; Pourdamghani, N ; Khanipour, P ; Sharif University of Technology

2011

Abstract

In this paper, we introduce a novel method for human activity recognition that benefits from the structure and sequential properties of the test data as well as the training data. In the training phase, we obtain a fraction of data labels at constant time intervals and use them in a semi-supervised graph-based method for recognizing the user's activities. We use label propagation on a k-nearest neighbor graph to calculate the probability of association of the unlabeled data to each class in this phase. Then we use these probabilities to train an HMM in a way that each of its hidden states corresponds to one class of activity. These probabilities are used to learn the transition probabilities...

Manifold coarse graining for online semi-supervised learning

, Article Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 5 September 2011 through 9 September 2011 ; Volume 6911 LNAI, Issue PART 1 , September , 2011 , Pages 391-406 ; 03029743 (ISSN) ; 9783642237799 (ISBN) Farajtabar, M ; Shaban, A ; Rabiee, H. R ; Rohban, M. H ; Sharif University of Technology

2011

Abstract

When the number of labeled data is not sufficient, Semi-Supervised Learning (SSL) methods utilize unlabeled data to enhance classification. Recently, many SSL methods have been developed based on the manifold assumption in a batch mode. However, when data arrive sequentially and in large quantities, both computation and storage limitations become a bottleneck. In this paper, we present a new semi-supervised coarse graining (CG) algorithm to reduce the required number of data points for preserving the manifold structure. First, an equivalent formulation of Label Propagation (LP) is derived. Then a novel spectral view of the Harmonic Solution (HS) is proposed. Finally an algorithm to reduce...

Regularization from the Machine Learning Point of View

, M.Sc. Thesis Sharif University of Technology Ghaemi, Mohammad Sajjad (Author) ; Daneshgar, Amir (Supervisor)

Abstract

In traditional machine learning approaches to classification, one uses only a labeled set to train the classifier. Labeled instances however are often difficult, expensive, or time consuming to obtain, as they require the efforts of experienced human annotators. Meanwhile unlabeled data may be relatively easy to collect, but there has been few ways to use them. Semi-supervised learning addresses this problem by using large amount of unlabeled data, together with the labeled data, to build better classifiers. Because semi-supervised learning requires less human effort and gives higher accuracy.Formally, this intuition corresponds to estimating a label function f on the graph so that it...

محتواي پايان نامه

Semi-supervised Learning and its Application to Image Categorization

, M.Sc. Thesis Sharif University of Technology Farajtabar, Mehrdad (Author) ; Rabiee, Hamid Reza (Supervisor)

Abstract

Traditional methods for data classiﬁcation only make use of the labeled data. However, in most of the applications, labeling the unlabeled data is expensive, time consuming and requires expert knowledge. To overcome these problems, Semi-supervised Learning (SSL) methods have become an area of recent research that aim to eﬀectively addressing the problem of limited labeled data.One of the recently introduced SSL methods is the classiﬁcation based on geometric structure of the data, namely the data manifold. In this approach unlabeled data is utilized to recover the underlying structure of the data. The common assumption is that despite of being represented in a high dimensional space, data...

محتواي پايان نامه

Persian Statistical Natural Language Understanding Based on Partially Annotated Corpus

, M.Sc. Thesis Sharif University of Technology Jabbari, Fattaneh (Author) ; Sameti, Hossein (Supervisor)

Abstract

Spoken language understanding unit is one of the most important parts of a spoken dialogue system. The input of this system is the output of speech recognition unit. The main function of this unit is to extract the semantic information from the input utterances. There are two main types of approaches to do this task: rule-based approaches, and data-driven approaches. Today data-driven approaches are of more interest because they are more flexible and robust compared to the rule-based approaches. The main drawback of these methods is that they need a large amount of fully annotated or in some cases Treebank data. Preparing such data is time consuming and expensive. The goal of this thesis is...

محتواي پايان نامه

A Semi-Supervised Ensemble Learning Algorithm for Nonstationary Data Streams Classification

, M.Sc. Thesis Sharif University of Technology Hosseini, Mohammad Javad (Author) ; Beigy, Hamid (Supervisor)

Abstract

Recent advances in storage and processing, have provided the ability of automatic gathering of information which in turn leads to fast and contineous flow of data. The data which are produced and stored in this way, are named data streams. data streams have many applications such as processing financial transactions, the recorded data of various sensors or the collected data by web sevices. Data streams are produced with high speed, large size and much dynamism and have some unique properties which make them applicable in precise modeling of many real data mining applications. The main challenge of data streams is the occurrence of concept drift which can be in four types: sudden, gradual,...

محتواي کتاب

Data Labelling Using Manifold-Based Semi-Supervised Learning in Multispectral Remote Sensing

, M.Sc. Thesis Sharif University of Technology Khajenezhad, Ahmad (Author) ; Rabiee, Hamid Reza (Supervisor) ; Safari, Mohammad Ali (Co-Advisor)

Abstract

Classification of hyperspectral remote sensing images is a challenging problem, because of the small number of labeled pixels, high dimensionality of the data and large number of pixels. In this context, semisupervised learning can improve the classification accuracy by extracting information form the distribution of all the labeled and unlabeled data. Among semi-supervised methods, manifold-based algorithms have been frequently used in recent years. In most of the previous works, manifolds are constructed according to spectral representation of data, while spatial dependency of pixel labels is an important property of the images in remote sensing applications. In this thesis, after...

محتواي کتاب

Fault Detection and Smart Monitoring of Industrial Fans Based on Vibration Signals

, M.Sc. Thesis Sharif University of Technology Moeeni, Hamed (Author) ; Manzuri Shalmani, Mohammad Taghi (Supervisor)

Abstract

Data Oriented Smart Monitoring for Industrial Machineries include approaches for fault detection and prognosis which only rely on non-stationary signals sampled from sensors and do not rely on physical model of machineries nor expert knowledge. Fault detection is task of determining state of machinery in present moment using past data. But in Prognosis focus is on predicting future state of machinery using past data. Most researches in this category are based on supervised algorithms, but in many applications labeling data is expensive. In this thesis some approaches for semi-superviseddiagnosis, based on markov random walk an K-NN have been implemented, also some improvements for K-NN have...

محتواي کتاب

Unsupervised Domain Adaptation via Representation Learning

, M.Sc. Thesis Sharif University of Technology Gheisary, Marzieh (Author) ; Soleymani, Mahdieh (Supervisor)

Abstract

The existing learning methods usually assume that training and test data follow the same distribution, while this is not always true. Thus, in many cases the performance of these learning methods on the test data will be severely degraded. We often have sufficient labeled training data from a source domain but wish to learn a classifier which performs well on a target domain with a different distribution and no labeled training data. In this thesis, we study the problem of unsupervised domain adaptation, where no labeled data in the target domain is available. We propose a framework which finds a new representation for both the source and the target domain in which the distance between these...

محتواي کتاب

Automatic image annotation using semi-supervised generative modeling

, Article Pattern Recognition ; Volume 48, Issue 1 , January , 2015 , Pages 174-188 ; 00313203 (ISSN) Amiri, S. H ; Jamzad, M ; Sharif University of Technology

Elsevier Ltd 2015

Abstract

Image annotation approaches need an annotated dataset to learn a model for the relation between images and words. Unfortunately, preparing a labeled dataset is highly time consuming and expensive. In this work, we describe the development of an annotation system in semi-supervised learning framework which by incorporating unlabeled images into training phase reduces the system demand to labeled images. Our approach constructs a generative model for each semantic class in two main steps. First, based on Gamma distribution, a generative model is constructed for each semantic class using labeled images in that class. The second step incorporates the unlabeled images by using a modified EM...

Incremental evolving domain adaptation

, Article IEEE Transactions on Knowledge and Data Engineering ; Volume 28, Issue 8 , 2016 , Pages 2128-2141 ; 10414347 (ISSN) Bitarafan, A ; Soleymani Baghshah, M ; Gheisari, M ; Sharif University of Technology

IEEE Computer Society

Abstract

Almost all of the existing domain adaptation methods assume that all test data belong to a single stationary target distribution. However, in many real world applications, data arrive sequentially and the data distribution is continuously evolving. In this paper, we tackle the problem of adaptation to a continuously evolving target domain that has been recently introduced. We assume that the available data for the source domain are labeled but the examples of the target domain can be unlabeled and arrive sequentially. Moreover, the distribution of the target domain can evolve continuously over time. We propose the Evolving Domain Adaptation (EDA) method that first finds a new feature space...

Combining Supervised and Semi-Supervised Learning in the Design of a New Identifier for NPPs Transients

, Article IEEE Transactions on Nuclear Science ; Volume 63, Issue 3 , 2016 , Pages 1882-1888 ; 00189499 (ISSN) Moshkbar Bakhshayesh, K ; Ghofrani, M. B ; Sharif University of Technology

Institute of Electrical and Electronics Engineers Inc 2016

Abstract

This study introduces a new identifier for nuclear power plants (NPPs) transients. The proposed identifier performs its function in two steps. First, the transient is identified by the previously developed supervised classifier combining ARIMA model and EBP algorithm. In the second step, the patterns of unknown transients are fed to the identifier based on the semi-supervised learning (SSL). The transductive support vector machine (TSVM) as a semi-supervised algorithm is trained by the labeled data of transients to predict some unlabeled data. The labeled and newly predicted data is then used to train the TSVM for another portion of unlabeled data. Training and prediction is continued until...

An ensemble of cluster-based classifiers for semi-supervised classification of non-stationary data streams

, Article Knowledge and Information Systems ; Volume 46, Issue 3 , 2016 , Pages 567-597 ; 02191377 (ISSN) Hosseini, M. J ; Gholipour, A ; Beigy, H ; Sharif University of Technology

Springer-Verlag London Ltd

Abstract

Recent advances in storage and processing have provided the possibility of automatic gathering of information, which in turn leads to fast and continuous flows of data. The data which are produced and stored in this way are called data streams. Data streams are produced in large size, and much dynamism and have some unique properties which make them applicable to model many real data mining applications. The main challenge of streaming data is the occurrence of concept drift. In addition, regarding the costs of labeling of instances, it is often assumed that only a small fraction of instances are labeled. In this paper, we propose an ensemble algorithm to classify instances of non-stationary...