Robust Speech Recognition Based on Data Compensation and MDT Methods

, M.Sc. Thesis Sharif University of Technology BabaAli, Bagher (Author) ; Sameti, Hossein (Supervisor)

Abstract

Automatic speech recognition performance degrades significantly when speech is affected by environmental noise. Nowadays, the major challenge is to achieve good robustness in adverse noisy conditions so that automatic speech recognizers can be used in real situations. Spectral subtraction (SS) is a well-known and effective approach; it was originally designed for improving the quality of speech signal judged by human listeners. SS techniques usually improve the quality and intelligibility of speech signal while speech recognition systems need compensation techniques to reduce mismatch between noisy speech features and clean trained acoustic model. Nevertheless, correlation can be expected...

A fast Speaker Identification method using nearest neighbor distance

, Article International Conference on Signal Processing Proceedings, ICSP ; Volume 3 , 2012 , Pages 2159-2162 ; 9781467321945 (ISBN) Zeinali, H ; Sameti, H ; Babaali, B ; Sharif University of Technology

2012

Abstract

By increasing the number of registered speakers in Speaker Identification (SI) systems, computation time for identifying an unknown speaker is significantly increased. This problem arises from the simple design of conventional methods. Due to this limitation, we cannot use conventional SI methods in real time applications. In this paper, we propose a two-step method to overcome this limitation. We use different identification methods for each step. In the first step we reduce the search space using Nearest Neighbor method. In the second step we identify the target speaker using the conventional GMM-based SI method. The experimental results show 3.4× speed-ups without any accuracy loss using...

Incorporating a novel confidence scoring method in a Persian spoken dialogue system

, Article SPA 2011 - Signal Processing: Algorithms, Architectures, Arrangements, and Applications - Conference Proceedings, 29 September 2011 through 30 September 2011, Poznan ; September , 2011 , Pages 74-78 ; 9781457714863 (ISBN) Sakhaee, E ; Sameti, H ; Babaali, B ; Sharif University of Technology

2011

Abstract

Reliability assessment of phonemes, syllabi, words, concepts or utterances has become the key feature of Automatic Speech Recognition (ASR) engines in order to make a decision to accept or reject a hypothesis. In this paper, we propose utterance-level confidence annotation based on combination of features extracted from multiple knowledge sources in Persian language. The experiment was conducted first to examine the performance of individual features, then to combine them using statistical data analysis and density estimation methods to assign a confidence score to utterances. Using the data collected from a Persian spoken dialogue system, we show that combining features from independent...

Likelihood-maximizing-based multiband spectral subtraction for robust speech recognition

, Article Eurasip Journal on Advances in Signal Processing ; Volume 2009 , 2009 ; 16876172 (ISSN) Babaali, B ; Sameti, H ; Safayani, M ; Sharif University of Technology

2009

Abstract

Automatic speech recognition performance degrades significantly when speech is affected by environmental noise. Nowadays, the major challenge is to achieve good robustness in adverse noisy conditions so that automatic speech recognizers can be used in real situations. Spectral subtraction (SS) is a well-known and effective approach; it was originally designed for improving the quality of speech signal judged by human listeners. SS techniques usually improve the quality and intelligibility of speech signal while speech recognition systems need compensation techniques to reduce mismatch between noisy speech features and clean trained acoustic model. Nevertheless, correlation can be expected...

Spectral subtraction in model distance maximizing framework for robust speech recognition

, Article 2008 9th International Conference on Signal Processing, ICSP 2008, Beijing, 26 October 2008 through 29 October 2008 ; 2008 , Pages 627-630 ; 9781424421794 (ISBN) BabaAli, B ; Sameti, H ; Safayani, M ; Sharif University of Technology

2008

Abstract

This paper has presented a novel discriminative parameters calibration approach based on the Model Distance Maximizing (MDM) to improve the performance of our previous proposed robustness method named spectral subtraction (SS) in likelihoodmaximizing framework. In the previous work, for adjusting the spectral over-subtraction factor of SS, conventional ML approach is used that only utilizes the true model without considering other confused models. This makes it very probably to reach a suboptimal solution. While in MDM, by maximizing the dissimilarities among models, the performance of our speech recognizer-based spectral subtraction method could be further improved. Experimental results...

A model distance maximizing framework for speech recognizer-based speech enhancement

, Article AEU - International Journal of Electronics and Communications ; Volume 65, Issue 2 , February , 2011 , Pages 99-106 ; 14348411 (ISSN) Babaali, B ; Sameti, H ; Falk, T. H ; Sharif University of Technology

2011

Abstract

This paper has presented a novel discriminative parameter calibration approach based on the model distance maximizing (MDM) framework to improve the performance of our previously-proposed method based on spectral subtraction (SS) in a likelihood-maximizing framework. In the previous work, spectral over-subtraction factors were adjusted based on the conventional maximum-likelihood (ML) approach that utilized only the true model and did not consider other confused models, thus likely reached suboptimal solutions. While in the proposed MDM framework, improved speech recognition performance is obtained by maximizing the dissimilarities among models. Experimental results based on FARSDAT, TIMIT...

A fast two-level Speaker Identification method employing sparse representation and GMM-based methods

, Article 2012 11th International Conference on Information Science, Signal Processing and their Applications, ISSPA 2012 ; 2012 , Pages 45-48 ; 9781467303828 (ISBN) Zeinali, H ; Sameti, H ; Khaki, H ; Babaali, B ; Sharif University of Technology

2012

Abstract

In large population Speaker Identification (SI), computation time has become one of the most important issues in recent real time systems. Test computation time depends on the cost of likelihood computation between test features and registered speaker models. For real time application of speaker identification, system must identify an unknown speaker quickly. Hence the conventional SI methods cannot be used. In this paper, we propose a two-step method that utilizes two different identification methods. In the first step we use Nearest Neighbor method to decrease the search space. In the second step we use GMM-based SI methods to specify the target speaker. We achieved 3.5× speed-ups without...

Non-speaker information reduction from Cosine Similarity Scoring in i-vector based speaker verification

, Article Computers and Electrical Engineering ; Volume 48 , November , 2015 , Pages 226–238 ; 00457906 (ISSN) Zeinali, H ; Mirian, A ; Sameti, H ; BabaAli, B ; Sharif University of Technology

Elsevier Ltd 2015

Abstract

Cosine similarity and Probabilistic Linear Discriminant Analysis (PLDA) in i-vector space are two state-of-the-art scoring methods in speaker verification field. While PLDA usually gives better accuracy, Cosine Similarity Scoring (CSS) remains a widely used method due to simplicity and acceptable performance. In this domain, several channel compensation and score normalization methods have been proposed to improve the performance. We investigate non-speaker information in cosine similarity metric and propose a new approach to remove it from the decision making process. I-vectors hold a large amount of non-speaker information such as channel effects, language, and phonetic content. This type...

Segmental HMM-based part-of-speech tagger

, Article 2010 International Conference on Audio, Language and Image Processing, ICALIP 2010, Shanghai, 23 November 2010 through 25 November 2010 ; 2010 , Pages 52-56 ; 9781424458653 (ISBN) Bokaei, M. H ; Sameti, H ; Bahrani, M ; Babaali, B ; Sharif University of Technology

2010

Abstract

This paper presents a solution in order to solve the problem of using HMM-based POS tagger in some languages where a word can be comprised of several tokens. Viterbi algorithm is modified in order to support segment of words within a model state. In the other word, the proposed system has a built-in tokenizer where indicates words boundaries as well as its corresponding tag sequence

Robust phoneme recognition using MLP neural networks in various domains of MFCC features

, Article 2010 5th International Symposium on Telecommunications, IST 2010, 4 December 2010 through 6 December 2010, Tehran ; 2010 , Pages 755-759 ; 9781424481835 (ISBN) Dabbaghchian, S ; Sameti, H ; Ghaemmaghami, M. P ; BabaAli, B ; Sharif University of Technology

2010

Abstract

This paper focuses on enhancing MFCC features using a set of MLP NN in order to improve phoneme recognition accuracy under different noise types and SNRs. A NN can be used in different domains (between any pair of MFCC feature extraction blocks). It includes FFT, MEL, LOG, DCT and DELTA domains. Various domains have different complexities and achieve different degrees. A comparative study is done in this paper in order to find the best domain. Furthermore, a set of MLP NNs, instead of one NN, is used to enhance various noise types with different levels of SNRs. In this case, each NN is trained with a special noise type and SNR. The database used in the simulations is created by artificially...

Speaker phone mode classification using Gaussian mixture models

, Article SPA 2011 - Signal Processing: Algorithms, Architectures, Arrangements, and Applications - Conference Proceedings, 29 September 2011 through 30 September 2011 ; September , 2011 , Pages 112-117 ; 9781457714863 (ISBN) Eghbal Zadeh, H ; Sobhan Manesh, F ; Sameti, H ; BabaAli, B ; Sharif University of Technology

2011

Abstract

This study focuses on the mode classification of phones speaker modes using GMM 1. In this regard, speech data in both enabled and disabled speaker modes of cell phones and telephones were collected, processed and classified into two different categories. The different mixture numbers (1 to 4) of GMM and wave files sizes of 10, 20, 40 and 80 kb were tested in order to obtain an optimal condition for classification. The GMM method attained 87.99% correct classification rate on test data. This classification is important for speech enabled IVR 2 systems [1], dialog systems and many systems in speech processing in the sense that it could help to load an optimum model for increasing system...

An efficient multi-band spectral subtraction method for robust speech recognition

, Article 2007 9th International Symposium on Signal Processing and its Applications, ISSPA 2007, Sharjah, 12 February 2007 through 15 February 2007 ; 2007 ; 1424407796 (ISBN); 9781424407798 (ISBN) Safayani, M ; Sameti, H ; Babaali, B ; Manzuri Shalmani, M. T ; Sharif University of Technology

2007

Abstract

In this paper we present a novel approach for adjusting a multi band spectral subtraction filter coefficients based on speech recognition system results. Currently most speech enhancement techniques are designed according to various waveform level criteria such as maximizing SNR or minimizing signal error. However improvement in these criteria does not necessarily result in increasing speech recognition performance. Only if these methods generate sequence of features that maximize or increase the likelihood of the correct transcription relative to other incorrect competing hypotheses, speech recognition performance will increase. Here we use an utterance with a known transcription and...

Nevisa, a Persian continuous speech recognition system

, Article 13th International Computer Society of Iran Computer Conference on Advances in Computer Science and Engineering, CSICC 2008, Kish Island, 9 March 2008 through 11 March 2008 ; Volume 6 CCIS , 2008 , Pages 485-492 ; 18650929 (ISSN); 3540899847 (ISBN); 9783540899846 (ISBN) Sameti, H ; Veisi, H ; Bahrani, M ; Babaali, B ; Hosseinzadeh, K ; Sharif University of Technology

2008

Abstract

In this paper we have reviewed Nevisa Persian speech recognition engine. Nevisa is an HMM-based, large vocabulary speaker-independent continuous speech recognition system. Like most successful recognition systems, MFCC with some modification has been used as speech signal features. It also utilizes a VAD based on signal energy and zero-crossing rate. Maximum likelihood estimation criterion the core of which are the classical segmental k-means and Baum-Welsh algorithms is used for training the acoustic models. The system is based on phoneme modeling and utilizes synchronous beam search based on lexicon tree for decoding the acoustic utterances. Language modeling for Persian has been...

Noise reduction algorithm for robust speech recognition using MLP neural network

, Article PACIIA 2009 - 2009 2nd Asia-Pacific Conference on Computational Intelligence and Industrial Applications, 28 November 2009 through 29 November 2009 ; Volume 1 , 2009 , Pages 377-380 ; 9781424446070 (ISBN) Ghaemmaghami, M. P ; Razzazi, F ; Sameti, H ; Dabbaghchian, S ; BabaAli, B ; Sharif University of Technology

2009

Abstract

We propose an efficient and effective nonlinear feature domain noise suppression algorithm, motivated by the minimum mean square error (MMSE) optimization criterion. Multi Layer Perceptron (MLP) neural network in the log spectral domain minimizes the difference between noisy and clean speech. By using this method as a pre-processing stage of a speech recognition system, the recognition rate in noisy environments is improved. We can extend the application of the system to different environments with different noises without re-training it. We need only to train the preprocessing stage with a small portion ofnoisy data which is created by artificially adding different types of noises from the...

Robust speech recognition using MLP neural network in log-spectral domain

, Article IEEE International Symposium on Signal Processing and Information Technology, ISSPIT 2009, 14 December 2009 through 16 December 2009, Ajman ; 2009 , Pages 467-472 ; 9781424459506 (ISBN) Ghaemmaghami, M. P ; Sametit, H ; Razzazi, F ; BabaAli, B ; Dabbaghchiarr, S ; Sharif University of Technology

2009

Abstract

In this paper, we have proposed an efficient and effective nonlinear feature domain noise suppression algorithm, motivated by the minimum mean square error (MMSE) optimization criterion. A Multi Layer Perceptron (MLP) neural network in the log spectral domain has been employed to minimize the difference between noisy and clean speech. By using this method, as a pre-processing stage of a speech recognition system, the recognition rate in noisy environments has been improved. We extended the application ofthe system to different environments with different noises without retraining HMMmodel. We trained the feature extraction stage with a small portion of noisy data which was created by...

Analyzing Airport Saturation Based on Continuous Descent Approach

, M.Sc. Thesis Sharif University of Technology Norouzi, Ramin (Author) ; Malaek, Mohamad Bagher (Supervisor)

Abstract

Airport saturation is one of the important issues which have always been of the specialists’ concern. One of the major applications of this subject is at the airport approach problems. While using current conventional approach procedures have many disadvantages regarding high fuel consumption and subsequent much environmental and noise pollution, the CDA concept which is one of the modern topics introduced in consequence of free flight theory, has less such disadvantages compared to conventional approach. Subject of this study is to investigate whether Procedural approach or CDA saturates an aiport sooner if the traffic flow increases. In this dissertation, using BADA database, CDA...

Decision Matrix for a Safe Aerial Launch Release Mechanism

, M.Sc. Thesis Sharif University of Technology Salmani, Hamed (Author) ; Malaek, Mohammad Bagher (Supervisor)

Abstract

The case of rapid small satellite aerial launch by mother airplane has various applications; such as space experiment, weather monitoring, and getting information from the earth monitoring during natural disasters. In this method, the satellite and the launch vehicle will be carried up by mother airplane to the definite height; then, the launch vehicle will be released and flied - to the insert of satellite into the desired orbit. Aerial launch system success depends on a set of complex elements and their interactions. In the present study, first of all, the element interactions will be investigated by system engineering approach based on operational scenarios; then, a new approach for...

Simulation and Analysis of the Coolant Mixing Test within the Reactor Pressure Vessel of BNPP Using ANSYS CFX 18.0

, M.Sc. Thesis Sharif University of Technology Khalvandi, Mohammad (Author) ; Ghofrani, Mohammad Bagher (Supervisor)

Abstract

Various factors, such as increasing or decreasing the heat removal from the initial circuit, or increasing the flow rate of the cooling fluid in the reactor, causes the phenomenon of the coolant mixing in the PWR reactors. In this project, the thermohydraulic test of coolant mixing has been simulated in the pressure vessel of the Bushehr nuclear reactor. In this test, the mixing of the coolant caused by the reduction of heat removal from the primary circuit by the secondary circuit is investigated. In this case, the primary circuit temperature increases in the loop where the heat removal is reduced. The most important consequence of this event is the reactivity changes at the core of the...

TCAS Logic Improvement for Airliners Formation Flights

, M.Sc. Thesis Sharif University of Technology Dolatabadi, Shirin (Author) ; Malaek, Mohammad Bagher (Supervisor)

Abstract

This study explores the trajectory optimization of two aircrafts’ formation flight seeking minimization of fuel consumption and flight duration while considering activation of the Traffic Collision Avoidance System (TCAS) during the flights. In order to model the effects of this system, a set of codes has been developed in the Matlab software environment, which utilizes an estimation model of induced drag for each aircraft by their relative positions. Also, position rotation is allowed, and the number and location of rotations are controllable by the user. The TCAS determines the allowable distances between aircrafts during flights based on relative velocities and relative heading angles....

Simulation of Loop Connection of RCPs during Commissioning Test of BNPP Using RELAP 5 Code to Two or Three Operating Ones

, M.Sc. Thesis Sharif University of Technology Zeynalian, MirHadi (Author) ; Ghofrani, Mohammad Bagher (Supervisor)

Abstract

In order to ensure the safety of the plants, a set of commissioning tests based on international standards for nuclear power plants is carried out before operation. In this research, one of the Bushehr power plant commissioning tests, the loop connection primary circuit cooling pump test, was simulated using the RELAP5 code. In this test, the effects of loop connection primary circuit pumps on the thermohydraulic parameters and the eneterance of test-related systems was evaluated. After simulating the systems related to the test and extracting the results of the stable state, were evaluated the transient results was obtained with the experimental results of the test of loop connection the...