Sharif Digital Repository / Sharif University of Technology / Search result

Noise and speaker robustness in a persian continuous speech recognition system

, Article 2007 9th International Symposium on Signal Processing and its Applications, ISSPA 2007, Sharjah, 12 February 2007 through 15 February 2007 ; 2007 ; 1424407796 (ISBN); 9781424407798 (ISBN) Veisi, H ; Sameti, H ; Sharif University of Technology

2007

Abstract

In this paper VTLN speaker normalization, MLLR and MAP adaptation methods are investigated in a Persian HMM-based speaker independent large vocabulary continuous speech recognition system. Speaker and environmental noise robustness are achieved in real world applications for this system. A search-based method is used in VTLN to find speaker relative warping factors. The warping factors are applied to signal's spectrum to normalize the variation effect of VTL between speakers. In the MLLR framework, Gaussian mean and covariance transformations in global and full adaptation are experienced. In this method, regression tree based adaptation in batch-supervised fashion is used. Also the standard...

Localized CAPTCHA for illiterate people

, Article 2007 International Conference on Intelligent and Advanced Systems, ICIAS 2007, Kuala Lumpur, 25 November 2007 through 28 November 2007 ; 2007 , Pages 675-679 ; 1424413559 (ISBN); 9781424413553 (ISBN) Shirali Shahreza, M. H ; Shirali Shahreza, M ; Sharif University of Technology

2007

Abstract

Nowadays, many daily human activities such as education, commerce, talks, etc. are carried out through the Internet. In cases such as the registering in websites, some hackers write programs to make automatic false enrolments which waste the resources of the website while this may even stop the entire website from working. Therefore, it is necessary to tell apart human users from computer programs which is known as CAPTCHA (Completely Automated Public Turing test to tell Computers and Human Apart). CAPTCHA methods are mainly based on the weak points of OCR (Optical Character Recognition) systems while using them are undesirable to human users. So the Non-OCR-Based CAPTCHA methods are...

Compensation of channel and noise distortions combining maximum likelihood based spectral subtraction and Normalization

, Article 2007 IEEE International Conference on Signal Processing and Communications, ICSPC 2007, Dubai, 14 November 2007 through 27 November 2007 ; 2007 , Pages 508-511 ; 9781424412365 (ISBN) Safayani, M ; Babaali, B ; Manzuri Shalmani, M. T ; Sameti, H ; Khaleghi, S ; Sharif University of Technology

2007

Abstract

Channel distortion may dramatically degrade speech recognition performance in a distant environment. Authors in their recent work [1] proposed a novel spectral subtraction method which they named it maximum likelihood based spectral subtraction (MLBSS). They indicated that recognition performance could be improved dramatically by adjusting filter parameters based on recognition results. Previous results show effectiveness of this method in dealing with additive distortion. In this paper we propose an approach for increasing robustness of this method against channel distortion in distant talking environment. We add Cepstral Mean Normalization (CMN) in designing MLBSS filter and show that by...

Introducing a framework to create telephony speech databases from direct ones

, Article 14th International Conference on Systems Signals and Image Processing, IWSSIP 2007 and 6th EURASIP Conference Focused on Speech and Image Processing, Multimedia Communications and Services, EC-SIPMCS 2007, Maribor, 27 June 2007 through 30 June 2007 ; November , 2007 , Pages 327-330 ; 9789612480295 (ISBN) Momtazi, S ; Sameti, H ; Vaisipour, S ; Tefagh, M ; Sharif University of Technology

2007

Abstract

A Comprehensive speech database is one of the important tools for developing speech recognition systems; these tools are necessary for telephony recognition, too. Although adequate databases for direct speech recognizers exist, there is not an appropriate database for telephony speech recognizers. Most methods suggested for solving this problem are based on building new databases which tends to consume much time and many resources; or they used a filter which simulates circuit switch behavior to transform direct databases to telephony ones, in this case resulted databases have many differences with real telephony databases. In this paper we introduce a framework for creating telephony speech...

Recognition of human speech phonemes using a novel fuzzy approach

, Article Applied Soft Computing Journal ; Volume 7, Issue 3 , 2007 , Pages 828-839 ; 15684946 (ISSN) Halavati, R ; Bagheri Shouraki, S ; Harati Zadeh, S ; Sharif University of Technology

2007

Abstract

Recognition of human speech has long been a hot topic among artificial intelligence and signal processing researches. Most of current policies for this subject are based on extraction of precise features of voice signal and trying to make most out of them by heavy computations. But this focus on signal details has resulted in too much sensitivity to noise and as a result, the necessity of complex noise detection and removal algorithms, which composes a trade-off between fast or noise robust recognition. This paper presents a novel approach to speech recognition using fuzzy modeling and decision making that ignores noise instead of its detection and removal. To do so, the speech spectrogram...

Fuzzy classification by multi-layer averaging: An application in speech recognition

, Article 3rd International Conference on Informatics in Control, Automation and Robotics, ICINCO 2006, Setubal, 1 August 2006 through 5 August 2006 ; Volume SPSMC , 2006 , Pages 122-126 ; 9728865619 (ISBN); 9789728865610 (ISBN) Alemzadeh, M ; Shouraki, S. B ; Halavati, R ; Sharif University of Technology

2006

Abstract

This paper intends to introduce a simple fast space-efficient linear method for a general pattern recognition problem. The presented algorithm can find the closest match for a given sample within a number of samples which has already been introduced to the system. The fact of using averaging and fuzzy numbers in this method encourages that it may be a noise resistant recognition process. As a test bed, a problem of recognition of spoken words has been set forth to this algorithm. Test data contain clean and noisy samples and results have been compared to that of a widely used speech recognition method, HMM

A novel approach to very fast and noise robust, isolated word speech recognition

, Article 18th International Conference on Pattern Recognition, ICPR 2006, Hong Kong, 20 August 2006 through 24 August 2006 ; Volume 3 , 2006 , Pages 190-193 ; 10514651 (ISSN); 0769525210 (ISBN); 9780769525211 (ISBN) Halavati, R ; Bagheri Shouraki, S ; Tajik, H ; Cholakian, A ; Razaghpour, M ; Sharif University of Technology

2006

Abstract

A novel very light weight approach to isolated word speech recognition is introduced. The approach uses a new simplistic feature set and a neural network recognition system. The algorithm's main processing requirements are FFT computation and a simple neural network comparison, making the method a suitable solution for low price embedded devices. The proposed method is tested on single speaker and multiple speaker test sets and the results are compared with a widely used speech recognition approach, presenting very fast recognition and quite good recognition rate. © 2006 IEEE

A novel fuzzy approach to speech recognition

, Article Proceedings - HIS'04: 4th International Conference on Hybrid Intelligent Systems, Kitakyushu, 5 December 2004 through 8 December 2004 ; 2005 , Pages 340-345 ; 0769522912 (ISBN) Halavati, R ; Shouraki, S.B ; Eshraghi, M ; Alemzadeh, M ; Ziaie, P ; Ishikawa M ; Hashimoto S ; Paprzycki M ; Barakova E ; Yoshida K ; Koppen M ; Corne D.M ; Abraham A ; Sharif University of Technology

2005

Abstract

This paper presents a novel approach to speech recognition using fuzzy modeling. The task begins with conversion of speech spectrogram into a linguistic description based on arbitrary colors and lengths. While phonemes are also described using these fuzzy measures, and recognition is done by normal fuzzy reasoning, a genetic algorithm optimizes phoneme definitions so that to classify samples into correct phonemes. The method is tested over a standard speech data base and the results are presented. © 2005 IEEE

Evolution of speech recognizer agents by artificial life

, Article Wec 05: Fourth World Enformatika Conference, Istanbul, 24 June 2005 through 26 June 2005 ; Volume 6 , 2005 , Pages 237-240 ; 9759845857 (ISBN) Halavati, R ; Bagheri Shouraki, S ; Harati Zadeh, S ; Lucas, C ; Ardil C ; Sharif University of Technology

2005

Abstract

Artificial Life can be used as an agent training approach in large state spaces. This paper presents an artificial life method to increase the training speed of some speech recognizer agents which where previously trained by genetic algorithms. Using this approach, vertical training (genetic mutations and selection) is combined with horizontal training (individual learning through reinforcement learning) and results in a much faster evolution than simple genetic algorithm. The approach is tested and a comparison with GA cases on a standard speech data base is presented. COPYRIGHT © ENFORMATIKA

An interactive tool for extracting human knowledge in speech recognition

, Article WSEAS Transactions on Computers ; Volume 4, Issue 2 , 2005 , Pages 276-279 ; 11092750 (ISSN) Ghiathi, S. K. A ; Bagheri Shouraki, S ; Sharif University of Technology

2005

Abstract

Conventional features for speech recognition have not been evaluated in terms of importance in human speech recognition. In this paper a method for extracting important features in an interactive process has been introduced. This method can be used as an aid for experts in an ASR expert system. It has also been shown, as an application of our method, how an expert might find out the distinguishing features between "m" and "n". As another use, it has been illustrated that how our method could be used to check the sufficiency of information in the quantized filter-bank for speech recognition

A robust voice activity detection based on wavelet transform

, Article 2nd International Conference on Electrical Engineering, ICEE, Lahore, 25 March 2008 through 26 March 2008 ; 2008 ; 9781424422937 (ISBN) Aghajani, K ; Manzuri, M. T ; Karami, M ; Tayebi, H ; Sharif University of Technology

2008

Abstract

Voice activity detection is an important step in some speech processing systems, such as speech recognition, speech enhancement, noise estimation, speech compression ... etc. In this paper a new voice activity detection algorithm based on wavelet transform is proposed. In this algorithm we use the energy in each sub band, and by two methods we extract feature vector from these values. Experimental results demonstrate advantage over different VAD methods. ©2008 IEEE