Sharif Digital Repository / Sharif University of Technology / Search result

HMM-based persian speech synthesis using limited adaptation data

, Article International Conference on Signal Processing Proceedings, ICSP ; Volume 1 , 2012 , Pages 585-589 ; 9781467321945 (ISBN) Bahmaninezhad, F ; Sameti, H ; Khorram, S ; Sharif University of Technology

2012

Abstract

Speech synthesis systems provided for the Persian language so far need various large-scale speech corpora to synthesize several target speakers' voice. Accordingly, synthesizing speech with a small amount of data seems to be essential in Persian. Taking advantage of a speaker adaptation in the speech synthesis systems makes it possible to generate speech with remarkable quality when the data of the speaker are limited. Here we conducted this method for the first time in Persian. This paper describes speaker adaptation based on Hidden Markov Models (HMMs) in Persian speech synthesis system for FARsi Speech DATabase (FARSDAT). In this regard, we prepared the whole FARSDAT, then for...

Spectral subtraction in model distance maximizing framework for robust speech recognition

, Article 2008 9th International Conference on Signal Processing, ICSP 2008, Beijing, 26 October 2008 through 29 October 2008 ; 2008 , Pages 627-630 ; 9781424421794 (ISBN) BabaAli, B ; Sameti, H ; Safayani, M ; Sharif University of Technology

2008

Abstract

This paper has presented a novel discriminative parameters calibration approach based on the Model Distance Maximizing (MDM) to improve the performance of our previous proposed robustness method named spectral subtraction (SS) in likelihoodmaximizing framework. In the previous work, for adjusting the spectral over-subtraction factor of SS, conventional ML approach is used that only utilizes the true model without considering other confused models. This makes it very probably to reach a suboptimal solution. While in MDM, by maximizing the dissimilarities among models, the performance of our speech recognizer-based spectral subtraction method could be further improved. Experimental results...

Speaker Adaptation in HMM-Based Persian Speech Synthesis

, M.Sc. Thesis Sharif University of Technology Bahmaninezhad, Fahimeh (Author) ; Sameti, Hossein (Supervisor)

Abstract

Text-to-speech synthesis, one of the key technologies in speech processing, is a technique for generating speech signal from arbitrarily given text with target speaker’s voice characteristics and various speaking styles and emotional expressions. Statistical parametric speech synthesishasrecently been shown to be very effective in generating acceptable synthesized speech. Therefore, in this study,the main focus is on one of the instances of these techniquescalled hidden Markov model-based speech synthesis. In text-to-speech systems, it is desirable to synthesize high quality speech using a small amount of speech data; this goal would be achieved by employing speaker adaptation framework and...

محتواي پايان نامه

Filter-bank design based on dependencies between frequency components and phoneme characteristics

, Article European Signal Processing Conference, 29 August 2011 through 2 September 2011 ; Septembe , 2011 , Pages 2142-2145 ; 22195491 (ISSN) Mohammadi, S. H ; Sameti, H ; Tavanaei, A ; Soltani Farani, A ; Sharif University of Technology

2011

Abstract

Mel-frequency Cepstral coefficients are widely used for feature extraction in speech recognition systems. These features use Mel-scaled filters. A new filter-bank based on dependencies between frequency components and phoneme characteristics is proposed. F-ratio and mutual information are used for this purpose. A new filter-bank is designed in which frequency resolution of sub-band filters is inversely proportional to the computed dependency values. These new filterbank is used instead of Mel-scaled filters for feature extraction. A phoneme recognition experiment on FARSDAT Persian language database showed that features extracted using the proposed filter-bank reach higher accuracy (63.92%)...

A model distance maximizing framework for speech recognizer-based speech enhancement

, Article AEU - International Journal of Electronics and Communications ; Volume 65, Issue 2 , February , 2011 , Pages 99-106 ; 14348411 (ISSN) Babaali, B ; Sameti, H ; Falk, T. H ; Sharif University of Technology

2011

Abstract

This paper has presented a novel discriminative parameter calibration approach based on the model distance maximizing (MDM) framework to improve the performance of our previously-proposed method based on spectral subtraction (SS) in a likelihood-maximizing framework. In the previous work, spectral over-subtraction factors were adjusted based on the conventional maximum-likelihood (ML) approach that utilized only the true model and did not consider other confused models, thus likely reached suboptimal solutions. While in the proposed MDM framework, improved speech recognition performance is obtained by maximizing the dissimilarities among models. Experimental results based on FARSDAT, TIMIT...

Mel-scaled Discrete Wavelet Transform and dynamic features for the Persian phoneme recognition

, Article 2011 International Symposium on Artificial Intelligence and Signal Processing, AISP 2011, 15 June 2011 through 16 June 2011 ; June , 2011 , Pages 138-140 ; 9781424498345 (ISBN) Tavanaei, A ; Manzuri, M. T ; Sameti, H ; Sharif University of Technology

2011

Abstract

In this paper we use a feature vector consisting of the Mel Frequency Discrete Wavelet Coefficients to recognize spoken phonemes in the Persian language. The purpose of using wavelet in feature extraction is to benefit from its multi resolution analysis and localization property in time and frequency domains. The MFDWCs are obtained by applying the Discrete Wavelet Transform (DWT) to the Mel-scaled log filter bank energies of a speech frame. Feature vectors are used for the HMM-based phoneme recognition on a portion of the FarsDat Persian language database consisting of 35 hour recorded data for training and 15 hour for testing. We evaluate the performance of new features for clean speech...

Introducing a framework to create telephony speech databases from direct ones

, Article 14th International Conference on Systems Signals and Image Processing, IWSSIP 2007 and 6th EURASIP Conference Focused on Speech and Image Processing, Multimedia Communications and Services, EC-SIPMCS 2007, Maribor, 27 June 2007 through 30 June 2007 ; November , 2007 , Pages 327-330 ; 9789612480295 (ISBN) Momtazi, S ; Sameti, H ; Vaisipour, S ; Tefagh, M ; Sharif University of Technology

2007

Abstract

A Comprehensive speech database is one of the important tools for developing speech recognition systems; these tools are necessary for telephony recognition, too. Although adequate databases for direct speech recognizers exist, there is not an appropriate database for telephony speech recognizers. Most methods suggested for solving this problem are based on building new databases which tends to consume much time and many resources; or they used a filter which simulates circuit switch behavior to transform direct databases to telephony ones, in this case resulted databases have many differences with real telephony databases. In this paper we introduce a framework for creating telephony speech...