Loading...

Speech Enhancement Based on Statistical Methods

Veisi, Hadi | 2011

647 Viewed
  1. Type of Document: Ph.D. Dissertation
  2. Language: Farsi
  3. Document No: 42318 (19)
  4. University: Sharif University of Technology
  5. Department: Computer Engineering
  6. Advisor(s): Sameti, Hossein
  7. Abstract:
  8. Signle-channel speech enhancement using hidden Markov model (HMM) based on minimum mean square error (MMSE) estimator is focused on and an HMM-based speech enhancement in Mel-frequency domain is proposed. The MMSE estimator results in a weighted sum filtering of the noisy signal in which accurate estimation of the filter values and filter weights comprise the main challenges. The cepstral domain modeling for speech enhancement is motivated by accurate filter selection in this domain. In the propsed framework, Mel-frequency spectral (MFS) and Mel-frequency cepstral (MFC) features are studied and experimented. In addition to the spectrum estimator, magnitude spectrum, log-magnitude spectrum and power spectrum estimators are studied and evaluated in the HMM-based speech enhancement system. However, to estimate clean speech waveform from a noisy signal in the proposed framework, an inversion from the Mel-frequency domain to the spectral domain is required which introduces distortion artifacts in the estimation of the filter values. To reduce the corrupting effects of the inversion, a parallel cepstral and spectral (PCS) modeling is proposed. This method performs concurrent modeling in both cepstral and spectral domains. To do the enhancement, the cepstral domain models are used to estimate the filter weights and the sepectral domain models are utilized to create the noise reduction filters. Furthermore, a voice activity detector (VAD) based on the HMM with an accurate hangover mechanism is proposed and is integrated in the enhancement system. The proposed VAD is employed to detect the non-speech segments of the signal and update the noise statistics. It utilizes a two-state HMM consisting of speech presence and speech absence states and constructed from the noisy and noise HMMs, respectively. The proposed VAD provides a robust detection of speech segments in the presence of noise. As another innovation of this thesis, a new algebretic gain compensation technique is introduced to reduce the effect of the energy mismatch between the trained models and the noisy speech signal. The performances of the proposed methods are comprehensively evaluated on three speech databases, TIMIT, Farsdat and Grid, in the presence of six noise types, white, Volvo, machinegun, babble, office and F16 at five SNR levels, 10, 5, 0, -5 and -10dB. The results are compared with several established speech enhancement methods especially auto-regressive HMM-based speech enhancement (AR-HMM). The experimental results confirm the superiority of the proposed methods over the reference methods, particularly for non-stationary noises. The averages (over all noise types in all SNR levels) of overall SNR improvemnt and PESQ improvemnt for the best proposed method are 4.07dB and 0.61MOS, respectively which are 1.47dB and 0.2MOS higher than AR-HMM
  9. Keywords:
  10. Hidden Markov Model ; Speech Enhancement ; Statistical Methods ; Cepstral Domain ; Paralel Cepstral-Spectral (PCS)Modeling ; Voice Activity Detector (VAD)

 Digital Object List

 Bookmark

No TOC