Loading...
Speech Enhancement Using Diffusion Probabilistic Models in Frequency Domain
Jafaei, Omid | 2025
0
Viewed
- Type of Document: M.Sc. Thesis
- Language: Farsi
- Document No: 58156 (05)
- University: Sharif University of Technology
- Department: Electrical Engineering
- Advisor(s): Kazemi, Reza; Ghaemmaghami, Shahrokh
- Abstract:
- In everyday life and modern telecommunications applications, the quality and intelligibility of speech signals often degrade due to various types of background noise. To address this issue, methods for enhancing noisy speech signals have been developed. Recently, deep learning, particularly deep generative models, has gained attention for solving this problem. At the forefront of this attention are diffusion models, which offer good generalization capabilities and demonstrate strong performance. However, a major drawback of these models is their high computational cost. In this thesis, we aim to reduce the training and sampling time of diffusion models for speech enhancement. Our implemented approach replaces standard diffusion models with latent diffusion models, meaning that the enhancement of short-time Fourier transform representations of noisy signals is performed in a latent space. This idea has been applied for the first time in this context. By adopting this approach, we have been able to reduce the model's sampling time from approximately 2 hours to 20 minutes for a fixed dataset size—an 83% reduction—while the enhanced signals experienced only a minor quality drop of 5% in perceptual speech quality assessment (from 2.44 to 2.32). Additionally, compared to similar research, the enhanced samples generated by our proposed model achieve a better signal-to-noise ratio
- Keywords:
- Speech Enhancement ; Neural Network ; Diffusion Model ; Signal to Noise Ratio ; Latent Diffusion Model
-
محتواي کتاب
- view
