Sharif Digital Repository / Sharif University of Technology / Search result

The combination of CMS with PMC for improving robustness of speech recognition systems

, Article 13th International Computer Society of Iran Computer Conference on Advances in Computer Science and Engineering, CSICC 2008, Kish Island, 9 March 2008 through 11 March 2008 ; Volume 6 CCIS , 2008 , Pages 825-829 ; 18650929 (ISSN); 3540899847 (ISBN); 9783540899846 (ISBN) Veisi, H ; Sameti, H ; Sharif University of Technology

2008

Abstract

This paper addresses the robustness problem of automatic speech recognition systems for real applications in presence of noise. PMCC algorithm is proposed for combining PMC technique with CMS method. The proposed algorithm utilizes the CMS normalization ability in PMC method to takes the advantages of these methods to compensate the effect of both additive and convolutional noises. Also, we have investigated VTLN for speaker normalization and MLLR and MAP for speaker and acoustic adaptation. Different combinations of these methods are used to achieve robustness and making the system usable in real applications. Our evaluations are done on 4 different real noisy tasks on Nevisa recognition...

Filter-bank design based on dependencies between frequency components and phoneme characteristics

, Article European Signal Processing Conference, 29 August 2011 through 2 September 2011 ; Septembe , 2011 , Pages 2142-2145 ; 22195491 (ISSN) Mohammadi, S. H ; Sameti, H ; Tavanaei, A ; Soltani Farani, A ; Sharif University of Technology

2011

Abstract

Mel-frequency Cepstral coefficients are widely used for feature extraction in speech recognition systems. These features use Mel-scaled filters. A new filter-bank based on dependencies between frequency components and phoneme characteristics is proposed. F-ratio and mutual information are used for this purpose. A new filter-bank is designed in which frequency resolution of sub-band filters is inversely proportional to the computed dependency values. These new filterbank is used instead of Mel-scaled filters for feature extraction. A phoneme recognition experiment on FARSDAT Persian language database showed that features extracted using the proposed filter-bank reach higher accuracy (63.92%)...

Improving the performance of speech recognition systems using fault-tolerant techniques

, Article 2008 9th International Conference on Signal Processing, ICSP 2008, Beijing, 26 October 2008 through 29 October 2008 ; 2008 , Pages 579-582 ; 9781424421794 (ISBN) Veisi, H ; Sameti, H ; Sharif University of Technology

2008

Abstract

In this paper, using of fault tolerant techniques are studied and experimented in speech recognition systems to make these systems robust to noise. Recognizer redundancy is implemented to utilize the strengths of several recognition methods that each one has acceptable performance in a specific condition. Duplication-with-comparison and NMR methods are experimented with majority and plurality voting on a telephony Persian speech-enabled IVR system. Results of evaluations present two promising outcomes, first, it improves the performance considerably; second, it enables us to detect the outputs with low confidence. © 2008 IEEE

Coevolution of input sensors and recognition system to design a very low computation isolated, word speech recognition system

, Article Scientia Iranica ; Volume 14, Issue 6 , 2007 , Pages 625-630 ; 10263098 (ISSN) Halavati, R ; Shouraki, S. B ; Sharif University of Technology

Sharif University of Technology 2007

Abstract

Appropriate sensors are a crucial necessity for the success of recognition systems. Nature has always coevolved sensors and recognition systems and this can also be done in artificially intelligent systems. To get a very fast isolated word speech recognition system for a small embedded speech recognizer, an evolutionary approach has been used to create together the required sensors and appropriate recognition structures. The input sensors are designed and evolved through inspiration by the human auditory system and the classification is done by artificial neural networks. The resulting system is compared with a widely used speech recognition system, and the results are quite satisfactory. ©...

An improved parallel model combination method for noisy speech recognition

, Article Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2009 ; 2009 , Pages 237-242 ; 9781424454792 (ISBN) Veisi, H ; Sameti, H ; Sharif University of Technology

Abstract

In this paper a novel method, called PC-PMC, is proposed to improve the performance of automatic speech recognition systems in noisy environments. This method is based on the parallel model combination (PMC) technique and uses the Cepstral Mean Subtraction (CMS) normalization ability and Principal Component Analysis (PCA) compression and decorrelation capabilities. It takes the advantages of both additive noise compensation of PMC and convolutive noise removal ability of CMS and PCA. The first problem to be solved in the realizing of PC-PMC is that PMC algorithm requires invertible modules in the front-end of the system while CMS normalization is not an invertible process. Also, it is...

Acoustic modeling from frequency-domain representations of speech

, Article Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2 September 2018 through 6 September 2018 ; Volume 2018-September , 2018 , Pages 1596-1600 ; 2308457X (ISSN) Ghahremani, P ; Hadian, H ; Lv, H ; Povey, D ; Khudanpur, S ; Sharif University of Technology

International Speech Communication Association 2018

Abstract

In recent years, different studies have proposed new methods for DNN-based feature extraction and joint acoustic model training and feature learning from raw waveform for large vocabulary speech recognition. However, conventional pre-processed methods such as MFCC and PLP are still preferred in the state-of-the-art speech recognition systems as they are perceived to be more robust. Besides, the raw waveform methods - most of which are based on the time-domain signal - do not significantly outperform the conventional methods. In this paper, we propose a frequency-domain feature-learning layer which can allow acoustic model training directly from the waveform. The main distinctions from...

The effect of phase information in speech enhancement and speech recognition

, Article 2012 11th International Conference on Information Science, Signal Processing and their Applications, ISSPA 2012, 2 July 2012 through 5 July 2012 ; 2012 , Pages 1446-1447 ; 9781467303828 (ISBN) Langarani, M. S. E ; Veisi, H ; Sameti, H ; Sharif University of Technology

2012

Abstract

The majority of speech enhancement methods perform noise removal in spectral domain and construct the enhanced speech signal from the estimated magnitude of clean speech and the phase of the noisy speech. In this paper, we show that by incorporating the phase information in the enhancement process, the quality and intelligibility of speech signal are improved. In our investigations, the minimum mean-square error short-time spectral amplitude and MMSE log-spectral amplitude methods are used to estimate the magnitude spectrum of speech signal. By conducting six classes of experiments, it is shown that by taking the phase information into account, overall SNR and PESQ measures are improved. In...

A novel approach to HMM-based speech recognition systems using particle swarm optimization

, Article Mathematical and Computer Modelling ; Volume 52, Issue 11-12 , 2010 , Pages 1910-1920 ; 08957177 (ISSN) Najkar, N ; Razzazi, F ; Sameti, H ; Sharif University of Technology

2010

Abstract

The main core of HMM-based speech recognition systems is Viterbi algorithm. Viterbi algorithm uses dynamic programming to find out the best alignment between the input speech and a given speech model. In this paper, dynamic programming is replaced by a search method which is based on particle swarm optimization algorithm. The major idea is focused on generating an initial population of segmentation vectors in the solution search space and improving the location of segments by an updating algorithm. Several methods are introduced and evaluated for the representation of particles and their corresponding movement structures. In addition, two segmentation strategies are explored. The first...

Noise reduction algorithm for robust speech recognition using MLP neural network

, Article PACIIA 2009 - 2009 2nd Asia-Pacific Conference on Computational Intelligence and Industrial Applications, 28 November 2009 through 29 November 2009 ; Volume 1 , 2009 , Pages 377-380 ; 9781424446070 (ISBN) Ghaemmaghami, M. P ; Razzazi, F ; Sameti, H ; Dabbaghchian, S ; BabaAli, B ; Sharif University of Technology

Abstract

We propose an efficient and effective nonlinear feature domain noise suppression algorithm, motivated by the minimum mean square error (MMSE) optimization criterion. Multi Layer Perceptron (MLP) neural network in the log spectral domain minimizes the difference between noisy and clean speech. By using this method as a pre-processing stage of a speech recognition system, the recognition rate in noisy environments is improved. We can extend the application of the system to different environments with different noises without re-training it. We need only to train the preprocessing stage with a small portion ofnoisy data which is created by artificially adding different types of noises from the...

Robust speech recognition using MLP neural network in log-spectral domain

, Article IEEE International Symposium on Signal Processing and Information Technology, ISSPIT 2009, 14 December 2009 through 16 December 2009, Ajman ; 2009 , Pages 467-472 ; 9781424459506 (ISBN) Ghaemmaghami, M. P ; Sametit, H ; Razzazi, F ; BabaAli, B ; Dabbaghchiarr, S ; Sharif University of Technology

Abstract

In this paper, we have proposed an efficient and effective nonlinear feature domain noise suppression algorithm, motivated by the minimum mean square error (MMSE) optimization criterion. A Multi Layer Perceptron (MLP) neural network in the log spectral domain has been employed to minimize the difference between noisy and clean speech. By using this method, as a pre-processing stage of a speech recognition system, the recognition rate in noisy environments has been improved. We extended the application ofthe system to different environments with different noises without retraining HMMmodel. We trained the feature extraction stage with a small portion of noisy data which was created by...

A novel approach to HMM-based speech recognition system using particle swarm optimization

, Article BIC-TA 2009 - Proceedings, 2009 4th International Conference on Bio-Inspired Computing: Theories and Applications, 16 October 2009 through 19 October 2009 ; 2009 , Pages 296-301 ; 9781424438655 (ISBN) Najkar, N ; Razzazi, F ; Sameti, H ; Sharif University of Technology

Abstract

The main core of HMM-based speech recognition systems is the Viterbi Algorithm. Viterbi is performed using dynamic programming to find out the best alignment between input speech and given speech model. In this paper, dynamic programming is replaced by a search method which is based on particle swarm optimization algorithm. The major idea is focused on generating an initial population of segmentation vectors in the solution search space and improving the location of segments by an updating algorithm. Two methods are introduced for representation of each particle and movement structure. The results show that the effect of these factors is noticeable in finding the global optimum while...

Likelihood-maximizing-based multiband spectral subtraction for robust speech recognition

, Article Eurasip Journal on Advances in Signal Processing ; Volume 2009 , 2009 ; 16876172 (ISSN) Babaali, B ; Sameti, H ; Safayani, M ; Sharif University of Technology

2009

Abstract

Automatic speech recognition performance degrades significantly when speech is affected by environmental noise. Nowadays, the major challenge is to achieve good robustness in adverse noisy conditions so that automatic speech recognizers can be used in real situations. Spectral subtraction (SS) is a well-known and effective approach; it was originally designed for improving the quality of speech signal judged by human listeners. SS techniques usually improve the quality and intelligibility of speech signal while speech recognition systems need compensation techniques to reduce mismatch between noisy speech features and clean trained acoustic model. Nevertheless, correlation can be expected...

A time warping speech recognition system based on particle swarm optimization

, Article 2nd Asia International Conference on Modelling and Simulation, AMS 2008, Kuala Lumpur, 13 May 2008 through 15 May 2008 ; 2008 , Pages 585-590 ; 9780769531366 (ISBN) Rategh, S ; Razzazi, F ; Rahmani, A. M ; Gharan, S. O ; Sharif University of Technology

2008

Abstract

In this paper, dynamic programming alignment is replaced by a particle swarm optimization (PSO) procedure in time warping problem. The basic PSO is a very slow process to be applied to speech recognition application. In order to achieve a higher performance, by inspiring of PSO optimization methodology, we introduced a PSO Inspired Time warping Algorithm (PTW) that significantly increase the computational performance of time warping in alignments of long length massive data sets. As a main enhancement, a pruning strategy with an add-in controlling threshold is defined in PTW that causes a considerable reduction in recognition time, while maintaining the system accuracy comparing to DTW. ©...

Spectral subtraction in likelihood-maximizing framework for robust speech recognition

, Article INTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association, Brisbane, QLD, 22 September 2008 through 26 September 2008 ; December , 2008 , Pages 980-983 ; 19909772 (ISSN) Baba Ali, B ; Sameti, H ; Safayani, M ; Sharif University of Technology

2008

Abstract

Spectral Subtraction (SS), as a speech enhancement technique, originally designed for improving quality of speech signal judged by human listeners. it usually improve the quality and intelligibility of speech signals, while the speech recognition systems need compensation techniques capable of reducing the mismatch between the noisy speech features and the clean models. This paper proposes a novel approach for solving this problem by considering the SS and the speech recognizer as two interconnected components, sharing the common goal of improved speech recognition accuracy. The experimental evaluations on a real recorded database and the TIMIT database show that the proposed method can...

CAPTCHA for children

, Article 2008 IEEE International Conference on System of Systems Engineering, SoSE 2008, Monterey, CA, 2 June 2008 through 4 June 2008 ; 2008 ; 9781424421732 (ISBN) Shirali Shahreza, S ; Shirali Shahreza, M ; Sharif University of Technology

2008

Abstract

In some websites it is necessary to distinguishing between human users and computer programs which is known as CAPTCHA (Completely Automated Public Turing test to tell Computers and Human Apart). CAPTCHA methods are mainly based on the weak points of OCR systems and using them are undesirable to human users. In this paper a method has been presented for distinguishing between human users and computer programs on the basis of choice of an object shown on the screen. In this method some objects are chosen randomly and the pictures of these topics are downloaded from the Internet. Then after applying some effects such as rotation, all of the pictures are shown on the screen. Then we ask the...

Introducing a framework to create telephony speech databases from direct ones

, Article 14th International Conference on Systems Signals and Image Processing, IWSSIP 2007 and 6th EURASIP Conference Focused on Speech and Image Processing, Multimedia Communications and Services, EC-SIPMCS 2007, Maribor, 27 June 2007 through 30 June 2007 ; November , 2007 , Pages 327-330 ; 9789612480295 (ISBN) Momtazi, S ; Sameti, H ; Vaisipour, S ; Tefagh, M ; Sharif University of Technology

2007

Abstract

A Comprehensive speech database is one of the important tools for developing speech recognition systems; these tools are necessary for telephony recognition, too. Although adequate databases for direct speech recognizers exist, there is not an appropriate database for telephony speech recognizers. Most methods suggested for solving this problem are based on building new databases which tends to consume much time and many resources; or they used a filter which simulates circuit switch behavior to transform direct databases to telephony ones, in this case resulted databases have many differences with real telephony databases. In this paper we introduce a framework for creating telephony speech...