Sharif Digital Repository / Sharif University of Technology / Search result

Speech synthesis based on gaussian conditional random fields

, Article Communications in Computer and Information Science ; Vol. 427, issue , 2014 , p. 183-193 Khorram, S ; Bahmaninezhad, F ; Sameti, H ; Sharif University of Technology

Abstract

Hidden Markov Model (HMM)-based synthesis (HTS) has recently been confirmed to be the most effective method in generating natural speech. However, it lacks adequate context generalization when the training data is limited. As a solution, current study provides a new context-dependent speech modeling framework based on the Gaussian Conditional Random Field (GCRF) theory. By applying this model, an innovative speech synthesis system has been developed which can be viewed as an extension of Context-Dependent Hidden Semi Markov Model (CD-HSMM). A novel Viterbi decoder along with a stochastic gradient ascent algorithm was applied to train model parameters. Also, a fast and efficient parameter...

RGB-D scene segmentation with conditional random field

, Article 2014 6th Conference on Information and Knowledge Technology, IKT 2014 ; 2014 , pp. 134-139 ; ISBN: 9781479956609 Nasab, S. E ; Kasaei, S ; Sanaei, E ; Sharif University of Technology

Abstract

Segmentation of a scene to the part made is a challenging work. In this paper a graphical model is used for this task. The methods based on geometrical derivatives such as curvature and normal often haven't good result in segmentation of geometrically-complex architecture and lead to over-segmentation and even failure. Proposed method for segmentation contains two steps. At first region growing based on curvature, normal and color is used for growing region. This segmented cloud is used for unary potential in graphical model. Fully connected graph for Conditional Random Field with Gaussian kernel for pair wise potentials is used for correcting this segmentation. Gaussian kernels are based on...

Discriminative spoken language understanding using statistical machine translation alignment models

, Article Communications in Computer and Information Science ; Vol. 427, issue , Sep , 2014 , pp. 194-202 ; ISSN: 18650929 ; ISBN: 9783319108490 Aliannejadi, M ; Khadivi, S ; Ghidary, S. S ; Bokaei, M. H ; Sharif University of Technology

Abstract

In this paper, we study the discriminative modeling of Spoken Language Understanding (SLU) using Conditional Random Fields (CRF) and Statistical Machine Translation (SMT) alignment models. Previous discriminative approaches to SLU have been dependent on n-gram features. Other previous works have used SMT alignment models to predict the output labels. We have used SMT alignment models to align the abstract labels and trained CRF to predict the labels. We show that the state transition features improve the performance. Furthermore, we have compared the proposed method with two baseline approaches; Hidden Vector States (HVS) and baseline-CRF. The results show that for the F-measure the proposed...

Community detection using diffusion information

, Article ACM Transactions on Knowledge Discovery from Data ; Volume 12, Issue 2 , 2018 ; 15564681 (ISSN) Ramezani, M ; Khodadadi, A ; Rabiee, H. R ; Sharif University of Technology

Association for Computing Machinery 2018

Abstract

Community detection in social networks has become a popular topic of research during the last decade. There exist a variety of algorithms for modularizing the network graph into different communities. However, they mostly assume that partial or complete information of the network graphs are available that is not feasible in many cases. In this article, we focus on detecting communities by exploiting their diffusion information. To this end, we utilize the Conditional Random Fields (CRF) to discover the community structures. The proposed method, community diffusion (CoDi), does not require any prior knowledge about the network structure or specific properties of communities. Furthermore, in...

Community Detection in Social Networks by Using Information from Diffusion Network

, M.Sc. Thesis Sharif University of Technology Ramezani, Maryam (Author) ; Rabiee, Hamid Reza (Supervisor)

Abstract

Nowadays, Online Social Networks (OSNs) play an important role in the exchange of information among people. Some previous studies indicate that diffusion behavior and network structure are tightly related. Community structure is one of the most important features of OSNs. Access to the whole network topology is the necessary and prevalent requirement for most of community detection methods, so the limited access to full or partial topology can decrease their accuracy. Using traceable information over diffusion network is a solution to surmount this difficulty. In this work, we are concerned with the community detection by only using the diffusion information, while unlike the previous...

محتواي کتاب

Management of Classifiers Pool in Data Stream Classification Using Probabilistic Graphical Models

, M.Sc. Thesis Sharif University of Technology Talebi, Hesamoddin (Author) ; Beigy, Hamid (Supervisor)

Abstract

Concept drift is a common situation in data streams where distribution which data is generated from, changes over time due to various reasons like environmental changes. This phenomenon challenges classification process strongly. Recent studies on keeping a pool of classifiers each modeling one of the concepts, have achieved promising results. Storing used classifiers in a pool enables us to exploit prior knowledge of concepts in the future occurrence of them. Most of the methods presented so far, introduce a similarity measure between current and past concepts and select the closest stored concept as current one. These methods don’t consider possible relations and dependenies between...

محتواي کتاب

Persian Aspect-based Sentiment Analysis Using Learning Methods

, M.Sc. Thesis Sharif University of Technology Sabeti, Behnam (Author) ; Ghassem Sani, Gholamreza (Supervisor)

Abstract

As digital content grows rapidly due to the internet, user reviews about different topics such as product quality can be used as a rich source to check and analyze product quality and performance. Automatic methods are being widely used to extract these information because of the massive amount of available resources. Sentiment analysis is one of the important fields in natural language processing, which uses a combination of learning and rule-based methods to extract subjective information out of documents. Aspect based sentiment analysis deals with sentiment analysis based on each aspect of the product. It consists of two main steps: first, aspects should be extracted from the reviews and...

محتواي کتاب

Isoform Function Prediction Using Deep Neural Network

, M.Sc. Thesis Sharif University of Technology Ghazanfari, Sara (Author) ; Motahari, Abolfazl (Supervisor) ; Soleymani, Mahdieh (Supervisor)

Abstract

Isoforms are mRNAs that are produced from a same gene site in the phenomenon called Alternative Splicing. Studies have shown that more than 95% of multiexon genes in humans have undergone Alternative Splicing. Although there are few changes in mRNA sequence, They may have a systematic effect on cell function and regulation. It is widely reported that isoforms of a gene have distinct or even contrasting functions. Most studies have shown that alternative splicing plays a significant role in human health and disease. Despite the wide range of gene function studies, there is little information about isoforms’ functionalities. Recently, some computational methods based on Multiple Instance...

محتواي کتاب

Learning strengths and weaknesses of classifiers for RGB-D semantic segmentation

, Article 9th Iranian Conference on Machine Vision and Image Processing, 18 November 2015 through 19 November 2015 ; Volume 2016-February , 2015 , Pages 176-179 ; 21666776 (ISSN) ; 9781467385398 (ISBN) Fooladgar, F ; Kasaei, S ; Sharif University of Technology

IEEE Computer Society

Abstract

3D scene understanding is an open challenge in the field of computer vision. Most of the focus is on 2D methods in which the semantic labeling of each RGB pixel is considered. But, in this paper, the 3D semantic labeling of RGB-D images is considered. In the proposed method, to extract some meaningful features, the superpixel generation algorithm is applied to the RGB image to segment it into a set of disjoint pixels. After that, the set of three powerful classifiers are utilized to semantically label each superpixel. In the proposed method, the probability outputs of these classifiers are concatenated as the novel feature vector for each superpixel. Consequently, to analyze the strengths...

Multiple human 3D pose estimation from multiview images

, Article Multimedia Tools and Applications ; 2017 , Pages 1-29 ; 13807501 (ISSN) Ershadi Nasab, S ; Noury, E ; Kasaei, S ; Sanaei, E ; Sharif University of Technology

Abstract

Multiple human 3D pose estimation is a challenging task. It is mainly because of large variations in the scale and pose of humans, fast motions, multiple persons in the scene, and arbitrary number of visible body parts due to occlusion or truncation. Some of these ambiguities can be resolved by using multiview images. This is due to the fact that more evidences of body parts would be available in multiple views. In this work, a novel method for multiple human 3D pose estimation using evidences in multiview images is proposed. The proposed method utilizes a fully connected pairwise conditional random field that contains two types of pairwise terms. The first pairwise term encodes the spatial...

Creating a corpus for automatic punctuation prediction in persian texts

, Article 2017 25th Iranian Conference on Electrical Engineering, ICEE 2017, 2 May 2017 through 4 May 2017 ; 2017 , Pages 1537-1542 ; 9781509059638 (ISBN) Hosseini, S. M ; Sameti, H ; Sharif University of Technology

Abstract

We present a novel corpus for automatic punctuation prediction in persian texts. punctuation prediction is an important task in automatic speech recognition (ASR). The output of ASR systems is typically a raw sequence of words with no punctuation marks; this makes the text difficult or even impossible to make sense of for humans and also for any text processing unit. In this work, we have assembled a state-of-the-art Persian corpus to train and test a punctuation prediction model. To the best of our knowledge, this is the first ever corpus specifically designed for punctuation prediction in Persian texts. The corpus is a modification of a manually part-of-speech (POS) tagged Persian one,...

Multiple human 3D pose estimation from multiview images

, Article Multimedia Tools and Applications ; Volume 77, Issue 12 , June , 2018 , Pages 15573-15601 ; 13807501 (ISSN) Ershadi Nasab, S ; Noury, E ; Kasaei, S ; Sanaei, E ; Sharif University of Technology

Springer New York LLC 2018

Abstract

Multiple human 3D pose estimation is a challenging task. It is mainly because of large variations in the scale and pose of humans, fast motions, multiple persons in the scene, and arbitrary number of visible body parts due to occlusion or truncation. Some of these ambiguities can be resolved by using multiview images. This is due to the fact that more evidences of body parts would be available in multiple views. In this work, a novel method for multiple human 3D pose estimation using evidences in multiview images is proposed. The proposed method utilizes a fully connected pairwise conditional random field that contains two types of pairwise terms. The first pairwise term encodes the spatial...

Compressed Domain Moving Object Detection Based on CRF

, Article IEEE Transactions on Circuits and Systems for Video Technology ; Volume 30, Issue 3 , 2020 , Pages 674-684 Alizadeh, M ; Sharifkhani, M ; Sharif University of Technology

Institute of Electrical and Electronics Engineers Inc 2020

Abstract

This paper aims to present a novel accurate moving object detection method based on the conditional random field (CRF) for high efficiency video coding/H.265 compressed domain video sequences. For each block, the number of consumed bits, motion vectors (MVs), and partitioning modes for a given block is extracted from the compressed bitstream. After removing outlier MVs, compensating MVs are assigned to the I-blocks based on their neighboring blocks. The information, such as MV, partitioning mode, and bit consumption, is used in the potential functions of a CRF model which is updated for every frame to detect the objects. Then, a number of standard test video sequences are used to verify the...

Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science (M.Sc.) in Computer Engineering, Artificial Intelligence

, M.Sc. Thesis Sharif University of Technology Hosseini, Mohammad Saleh (Author) ; Sameti, Hossein (Supervisor)

Abstract

Punctuation marks in every language, constitute an important part of a text. Not inserting these punctuations in text, makes the text ambiguous. The output text of automatic speech recognition (ASR) system, is typically a raw sequence of words, containing no punctuation marks. This makes the text difficult or even impossible to make sense of for humans, as well as for any further text processing tasks. The goal of this thesis is to perform automatic punctuation insertion in Persian texts lacking punctuation marks. To the best of our knowledge, this is the first work done in this context for the Persian language. For this purpose, firstly, we assembled a state-of-the-art corpus to train and...

محتواي کتاب

Improving the Training Process of Understanding Unit in Spoken Dialog Systems Using Active Learning Methods

, M.Sc. Thesis Sharif University of Technology Hadian, Hossein (Author) ; Sameti, Hossein (Supervisor)

Abstract

This thesis aims at reducing the need for labeled data in the SLU domain by the means of active Learning methods. This need is due to the lack of labeled datasets for Spoken Language Understanding (SLU) in the Persian language, and fairly high labeling costs. Active learning methods enables the learner to choose the most informative instances to be labeled and used for training, and prevents labeling uninformative or redundant instances. For modeling the SLU system, several statistical models namely MLN (Markov Logic Networks), CRF (Conditional Random Fields), HMM (Hidden Markov Model) and HVS (Hidden Vector State) were reviewed, and finally CRF was chosen for its superior performance. The...

محتواي کتاب

Semantic segmentation of RGB-D images using 3D and local neighbouring features

, Article 2015 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2015, 23 November 2015 through 25 November 2015 ; 2015 ; 9781467367950 (ISBN) Fooladgar, F ; Kasaei, S ; Sharif University of Technology

Institute of Electrical and Electronics Engineers Inc 2015

Abstract

3D scene understanding is one of the most important problems in the field of computer vision. Although, in the past decades, considerable attention has been devoted on the 2D scene understanding problem, now with the development of the depth sensors (like Microsoft Kinect), the 3D scene understanding has become a very challenging task. Traditionally, the scene understanding problem was considered as the semantic labeling of each image pixel. Semantic labeling of RGB-D images has not attained a comparable success, as the RGB semantic labeling, due to the lack of a challenging dataset. With the introduction of an RGB-D dataset, called NYU-V2, it became possible to propose a novel method to...

Improving Speech Signal Models for Statistical Parametric Speech Synthesis

, Ph.D. Dissertation Sharif University of Technology Khorram, Soheil (Author) ; Sameti, Hossein (Supervisor)

Abstract

Statistical parametric speech synthesis (SPSS) has dominated speech synthesis research area over the last decade, due to its remarkable advantages such as high intelligibility and flexibility. Decision tree-clustered context-dependent hidden semi-Markov models are typically used in SPSS to represent probability densities of acoustic features given contextual factors. This research addresses four major limitations of this decision tree-based structure: (a) The decision tree structure lacks adequate context generalization; (b) It is unable to express complex context dependencies; (c) Parameters generated from this structure represent sudden transitions between adjacent states; (e) This...

محتواي کتاب