Sharif Digital Repository / Sharif University of Technology / Search result

Keyword Spotting in Continuous Speech Based on Hidden Markov Model

, M.Sc. Thesis Sharif University of Technology Tavanaei, Amirhossein (Author) ; Sameti, Hossein (Supervisor)

Abstract

In this thesis we describe Keyword Spotting in continuous speech based on hidden Markov modeling. The aim of keyword spotting is to detect the specified keywords and get rid of other speech streams by a network of keyword models and a garbage model. Phoneme recognition is the basis of this work and we obtain appropriate feature vector and model for phonemes. Two main parts of keyword spotting are the keyword models and the filler model connected together by a network grammar. The Viterbi algorithm can recognize keywords and non-keywords using the network grammar. Each keyword model is created by concatenation of phoneme HMMs. In experiments keyword models with one skip in states of HMMs...

محتواي پايان نامه

Telephony Text-Independent Speaker Verification in Total Variability Space

, M.Sc. Thesis Sharif University of Technology Mirian, Alireza (Author) ; Sameti, Hossein (Supervisor)

Abstract

Given two speech segments, the task of speaker verification is defined as determining whether or not both of them have been uttered by the same person. Most of the new approaches in speaker verification are based on Total Variability Space which is the result of applying a factor analysis on GMM mean supervector space. The representation of speech with arbitrary duration in this space is called i-vector.
In this thesis, first the basics of speaker verification is described and i-vector approaches are explained in more details. Then, a method for improving accuracy of Cosine Similarity Scoring is proposed which normalize the raw score using the score of test utterance against a model- and...

محتواي کتاب

Speaker Verification using Limited Enrollment Data

, M.Sc. Thesis Sharif University of Technology Kalantari, Elaheh (Author) ; Sameti, Hossein (Supervisor)

Abstract

In this thesis, we investigate speaker verification as a biometric technology to verify a person based on his/her claim. Text-dependent speaker verification systems are preferred in commercial and security applications and these systems have better performance in limited data condition based on a prior knowledge about speakers that are assumed to be cooperative. Limited amount of enrollment data is a major concern in this thesis. Speaker dependent model construction and channel variability issues on telephone-based text-dependent speaker verification applications are surveyed. Due to the lack of an appropriate database for the task, we collected a database which is referred to as text-prompt...

محتواي کتاب

Robust Speaker Verification in Total Variability Space

, M.Sc. Thesis Sharif University of Technology La’l Mohammadi, Mahnoosh (Author) ; Sameti, Hossein (Supervisor)

Abstract

Our study is mainly related to speaker verification systems. Given a speech segment and a claimed identity, these systems must decide whether the claimant is admissible or a fraud. Our main focus in on making the speaker verification system robust in case of limited training data. When there is limited training data, the accuracy of speaker verification systems reduces drastically. Our main purpose is to review this problem deeply and to represent methods in order to encounter this challenge. Recently, some methods such as PLDA and distance metric learning have been applied in text-independent speaker verification in order to encounter limited data crisis. One of the important cases in which...

محتواي کتاب

Improving Robustness of Speaker Verification Systems Against Non-Identity Information

, Ph.D. Dissertation Sharif University of Technology Zeinali, Hossein (Author) ; Sameti, Hossein (Supervisor)

Abstract

Speaker verification as a kind of biometric methods aims to verify the identity of a person from characteristics of their voice. This method faces many challenges such as voice imitation (spoofing), use of recorded voice, high sensitivity to convolutive distortions resulted by channel, and a large performance degradation for short-duration utterances. The aim of this thesis is to propose different methods for reducing the effects of non-identity information,especially the channel, and also solving the problem of new methods for text-dependent speaker verification with very short utterances. i-vector has been the best speaker modeling method in recent years but it doesn’t result in good...

محتواي کتاب

A Soft Spectrographic Mask Estimation for Speech Recognition

, M.Sc. Thesis Sharif University of Technology Esmaeelzadeh, Vahid (Author) ; Sameti, Hossein (Supervisor)

Abstract

Nowadays, robustness of the Automatic Speech Recognition (ASR) systems against various noises is major challenge in these systems. Missing feature speech recognition approaches are our goal in this thesis for achieving robust ASR systems. In these approaches, low SNR regions of a spectrogram are considered to be “missing” or “unreliable” and are removed from the spectrogram. Noise compensation is carried out by either estimating the missing regions from the remaining regions in some manner prior to recognition, or by performing recognition directly on incomplete spectrograms. These techniques clearly require a "spectrographic mask" which accurately labels the reliable and unreliable regions...

محتواي پايان نامه

Robust Speech Recognition Based on Data Compensation and MDT Methods

, M.Sc. Thesis Sharif University of Technology BabaAli, Bagher (Author) ; Sameti, Hossein (Supervisor)

Abstract

Automatic speech recognition performance degrades significantly when speech is affected by environmental noise. Nowadays, the major challenge is to achieve good robustness in adverse noisy conditions so that automatic speech recognizers can be used in real situations. Spectral subtraction (SS) is a well-known and effective approach; it was originally designed for improving the quality of speech signal judged by human listeners. SS techniques usually improve the quality and intelligibility of speech signal while speech recognition systems need compensation techniques to reduce mismatch between noisy speech features and clean trained acoustic model. Nevertheless, correlation can be expected...

محتواي پايان نامه

Improving Speech Signal Models for Statistical Parametric Speech Synthesis

, Ph.D. Dissertation Sharif University of Technology Khorram, Soheil (Author) ; Sameti, Hossein (Supervisor)

Abstract

Statistical parametric speech synthesis (SPSS) has dominated speech synthesis research area over the last decade, due to its remarkable advantages such as high intelligibility and flexibility. Decision tree-clustered context-dependent hidden semi-Markov models are typically used in SPSS to represent probability densities of acoustic features given contextual factors. This research addresses four major limitations of this decision tree-based structure: (a) The decision tree structure lacks adequate context generalization; (b) It is unable to express complex context dependencies; (c) Parameters generated from this structure represent sudden transitions between adjacent states; (e) This...

محتواي کتاب

Training-Based Speech Enhancement Using Non-Gaussian Distributions

, M.Sc. Thesis Sharif University of Technology Golrasan, Elham (Author) ; Sameti, Hossein (Supervisor)

Abstract

Statistical approaches (purely statistical and model-based) are the most efficient methods in single-channel speech enhancement. Despite these efficiencies, the problem of speech enhancement is still a challenge. Recent researches which propose univariate non-Gaussian distributions are more appropriate for speech signal in different domains. Based on these univariate distributions, statistical approaches have been modified and consequently better results have been reported. The purpose of this thesis is speech enhancement based on hidden Markov model using multivariate non-Gaussian distribution. The results of speech enhancement algorithm based on hidden Markov model in DCT and DFT domains...

محتواي کتاب

Normalization of Non-standard Texts for Persian language Using Neural
Networks

, M.Sc. Thesis Sharif University of Technology Seyyedi, Javad (Author) ; Sameti, Hossein (Supervisor)

Abstract

The purpose of this research is to normalize non-standard persian texts. We proposed a method to transfigure the texts with any non-standard structure into a formal and standard form. One of the major complications of the text normalization is the large variety of non-standard structures, and the fact that these diversities could not be classified in one constructional pattern. Furthermore, the concept of text normalization, in different situations, has multiple different definitions, and any of this settings needs a distinct normalization method. Supervised learning methods are not suitable for normalization due to variety of both standard and non-standard texts as well as the absence of...

محتواي کتاب

Large Vocabulary Isolated Word Recognition Using Neural Networks

, M.Sc. Thesis Sharif University of Technology Hajitabar, Alireza (Author) ; Sameti, Hossein (Supervisor)

Abstract

Speech Recognition is an important topic in speech processing. In this thesis, we intend to do Isolated Word Recognition (IWR) a large vocabulary dataset. Previous works on large vocabulary IWR have used Hidden Markov Models, Gaussian Mixture Model and hybrid methods for this purpose, But our approach is based on Deep Neural Network (DNN). DNNs have shown excellent performance recently in different applications of voice and image processing. A key factor in speech recognition is the availability at appropriate datasets. There has been no acceptable speech corpus in Persian language for isolated word recognition before this work. In addition, Persian IWR systems reported so far are quite...

محتواي کتاب

Speaker Diarization in Adverse Conditions

, M.Sc. Thesis Sharif University of Technology Mohammadi, Hamid Reza (Author) ; Sameti, Hossein (Supervisor)

Abstract

The goal of a speaker diarization system is to detect the number of speakers of a conversation and also assign each segment of the conversation to one of the speakers. In these types of systems it is assumed that the identity of the speakers is completely unknown. Usually speaker diarization systems operate in an offline mode. The system assumes that it does have the whole conversation at hand and then it starts processing the conversation. This method is effective for applications like spoken document retrieval, but it is not applicable to speech/speaker recognition systems which require online operating. In this dissertation, an online speaker diarization system is implemented. This...

محتواي پايان نامه

Improving the Training Process of Understanding Unit in Spoken Dialog Systems Using Active Learning Methods

, M.Sc. Thesis Sharif University of Technology Hadian, Hossein (Author) ; Sameti, Hossein (Supervisor)

Abstract

This thesis aims at reducing the need for labeled data in the SLU domain by the means of active Learning methods. This need is due to the lack of labeled datasets for Spoken Language Understanding (SLU) in the Persian language, and fairly high labeling costs. Active learning methods enables the learner to choose the most informative instances to be labeled and used for training, and prevents labeling uninformative or redundant instances. For modeling the SLU system, several statistical models namely MLN (Markov Logic Networks), CRF (Conditional Random Fields), HMM (Hidden Markov Model) and HVS (Hidden Vector State) were reviewed, and finally CRF was chosen for its superior performance. The...

محتواي کتاب

Text-Independent Speaker Identification in Large Population Applications

, M.Sc. Thesis Sharif University of Technology Zeinali, Hossein (Author) ; Sameti, Hossein (Supervisor)

Abstract

The human speech conveys much information such as semantic contents, emotion and even speaker identity. Our goal in this thesis is the task of text-independent speaker identification (SI) in large population applications. Identification (test) time has become one of the most important issues in recent real time systems. Identification time depends on the cost of likelihood computation between test features and registered speaker models. For real time application of SI, system must identify an unknown speaker quickly. Hence the conventional SI methods cannot be used. The main goal in this thesis is to propose several methods that reduced identification time without any loss of identification...

محتواي پايان نامه

Sound Symbolism Analysis : A Motor Theory Approach

, M.Sc. Thesis Sharif University of Technology Sepehrifar, Makan (Author) ; Sameti, Hossein (Supervisor)

Abstract

“Motor Theory” of speech perception discusses the role of motor competence in speech perception. Motor Competence is accessed, simulating the gestures which lead to the heard voice signals. Simulating is done using mirror neurons which share similar feelings between sender and receiver. Empathy, caused by usage of mirror neurons, probably caused some sort of link between inward motor gestures and outward semantic categories at the origin of language. Some degree of linkage between sound and meaning is what we call “Sound Symbolism”. Along with describing the link between Motor Theory and Sound Symbolism, this thesis took two approaches to analyze...

محتواي پايان نامه

Persian Speech Synthesis Using Hidden Markov Models

, M.Sc. Thesis Sharif University of Technology Bahaadini, Sara (Author) ; Sameti, Hossein (Supervisor)

Abstract

Scattered and little research in the field of Persian speech synthesis systems has been performed during the last ten years. Comprehensive framework that properly implements and adapts statistical speech synthesis methods for Persian has not been conducted yet. In this thesis, recent statistical parametric speech synthesis methods including CLUSTERGEN, traditional HMM-based speech synthesis and its STRAIGHT version, are implemented and adapted for Persian language. CCR test is carried out to compare these methods with each other and with unit selection method. Listeners Score samples based on CMOS. The methods were ranked by averaging the CCR scores. The results show that STRAIGHT-based...

محتواي پايان نامه

Language Modeling Using Recurrent Neural Networks

, M.Sc. Thesis Sharif University of Technology Rahimi, Adel (Author) ; Sameti, Hossein (Supervisor)

Abstract

This thesis examines the differences and the similarities between the two famous RNN blocks the Long Short Term Memory and the Gated Recurrent Unit. It measure different aspects such as computational complexity, Word Error Rate, and subjective human evaluation in the task of text generation.In the computational complexity experiment results show that the LSTM takes more time to compute, in comparison to the GRU. Moving on into the next experiment the GRU slightly outperforms the LSTM in terms of WER but the perplexity for the language models tested was the same. This shows that slight differences in the perplexity does not drastically change the WER. Having said, the results suggest that the...

محتواي کتاب

Persian End-To-End Speech Recognition

, M.Sc. Thesis Sharif University of Technology Hajipour Ghomi, Farzaneh (Author) ; Sameti, Hossein (Supervisor)

Abstract

This thesis provids a Persian End-To-End Speech Recognition system. In this system, the input is low-level features of speech signal. Deep recurrent neural networks (RNNs) and Long Short-Term Memory (LSTM) units as the RNN building blocks are used as the acoustic model. Continuous speech data is labeled by the CTC which is applied as the output layer of a recurrent neural network. By using the CTC objective function, acoustic modeling problem is simplified to just an RNN learning problem over pairs of speech and context-independent (CI) label sequences. A distinctive feature of this system is a generalized decoding approach based on weighted finite-state transducers (WFSTs), which enables...

محتواي کتاب

Using Discriminative Training Approaches for Large Vocabulary Isolated Word Recognition

, M.Sc. Thesis Sharif University of Technology Osati, Majid (Author) ; Sameti, Hossein (Supervisor)

Abstract

In this study, isolated word recognition problem has been studied in large scale and different acoustic models are engaged to solve the problem. Acoustic models, based on discriminative training methods, are compared our proposed approach with other available training methods. Acoustic models are built and trained based on HMM-GMM, HMM- subspace GMM and HMM-DNN using different training criteria such as Maximum Mutual Information (MMI), boosted MMI, Minimum Phoneme Error (MPE), and state-level Minimum Bayesian Risk (sMBR). Using these discriminative approaches led to improvement of speech recognition systems. Boosted MMI with boosting factor of 0.3 for HMM-DNN has resulted in Word Error Rate...

محتواي کتاب

Language Modeling for Persian using Recurrent Neural Networks

, M.Sc. Thesis Sharif University of Technology Pourbagheri, Mohammad (Author) ; Sameti, Hossein (Supervisor)

Abstract

During recent years, neural networks have been used for language modeling in tasks related to natural language processing. In these models, various structures of neural networks have been used, and recurrent networks (RNN) have achieved good results in these tasks. Since RNNs are not limited to a fixed number of words for predicting next word, they have achieved better results than feedforward networks. However, these networks have problems to learn long sequences, and long short-term memory (LSTM) networks have been presented for solving this problem. In this research, language models are extracted for Persian language using RNN and LSTM, and are compared with n-gram-based models. For...

محتواي کتاب