Loading...
Speech Enhancement Using Deep Neural Networks in Non-stationary Noise Environment
Hosseini, Ehsan | 2019
697
Viewed
- Type of Document: M.Sc. Thesis
- Language: Farsi
- Document No: 51646 (05)
- University: Sharif University of Technology
- Department: Electrical Engineering
- Advisor(s): Ghaem Maghami, Shahrokh
- Abstract:
- Before performing any operation on a speech signal, it is necessary to properly remove the environmental noise existing on it. Noise Canceling Operation on Speech Signal is called speech enhancement. Up to now, many studies have been conducted on various ways to enhance the speech signal. Among the existing methods, statistical methods have proven to be superior to others. In all noise removal methods, the main challenge is that most noises are non-stationary. Since most of the noises in the environment are non-stationary, we are still looking for the better ways to remove them. With the advent of deep neural networks and their successful results in areas such as machine learning, a method for removing the environmental non-stationary noises from a speech signal by using deep neural networks is presented in this thesis which gives better results than the conventional statistical methods. In this thesis, a system based on fully-connected and convolutional neural networks is introduced that uses the magnitude and phase of the log-power spectra and also MFC coefficients of a noisy speech to eliminate noise in the received speech signal. In order to achieve the proper structure, LAR, LSF coefficients are also studied along with several different structures for the neural network and the best structure is introduced. The proposed methods have been evaluated on the TIMIT corpus in the presence of nine different types of noises at three SNR levels by using PESQ, STOI and LSD benchmarks. They are compared with a well-known deep neural network structure. After using the methods stated in the thesis, the average superiority of the final structure to the reference structure in PESQ, STOI and LSD benchmarks are 0.16, 0.02 and 0.12 respectively
- Keywords:
- Speech Enhancement ; Noise Removing ; Non-Stationary Vibrations ; Deep Neural Networks ; Mel-Frequency Cepstrum (MFC) Coefficient ; Line Spectral Frequency (LSF) Coefficient ; Log Area Ratio (LAR) Coefficient
-
محتواي کتاب
- view