Acoustic modeling from frequency-domain representations of speech

Ghahremani, P; Hadian, H Lv, H Povey, D Khudanpur, S Sharif University of Technology

Please enable javascript in your browser.

Acoustic modeling from frequency-domain representations of speech

Ghahremani, P ; Sharif University of Technology | 2018

785 Viewed

Type of Document: Article
DOI: 10.21437/Interspeech.2018-1453
Publisher: International Speech Communication Association , 2018
Abstract:
In recent years, different studies have proposed new methods for DNN-based feature extraction and joint acoustic model training and feature learning from raw waveform for large vocabulary speech recognition. However, conventional pre-processed methods such as MFCC and PLP are still preferred in the state-of-the-art speech recognition systems as they are perceived to be more robust. Besides, the raw waveform methods - most of which are based on the time-domain signal - do not significantly outperform the conventional methods. In this paper, we propose a frequency-domain feature-learning layer which can allow acoustic model training directly from the waveform. The main distinctions from previous works are a new normalization block and a short-range constraint on the filter weights. The proposed setup achieves consistent performance improvements compared to the baseline MFCC and log-Mel features as well as other proposed time and frequency domain setups on different LVCSR tasks. Finally, based on the learned filters in our feature-learning layer, we propose a new set of analytic filters using polynomial approximation, which outperforms log-Mel filters significantly while being equally fast. © 2018 International Speech Communication Association. All rights reserved
Keywords:
Acoustic modeling ; Filter bank learning ; Frequency domain analysis ; Polynomial approximation ; Speech ; Speech communication ; Time domain analysis ; Acoustic model ; Acoustic model trainings ; Consistent performance ; Conventional methods ; Frequency-domain representations ; Large vocabulary speech recognition ; Speech recognition systems ; Time and frequency domains ; Speech recognition
Source: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2 September 2018 through 6 September 2018 ; Volume 2018-September , 2018 , Pages 1596-1600 ; 2308457X (ISSN)
URL: https://www.isca-speech.org/archive/Interspeech_2018/abstracts/1453.html

Friend's email
Your name
Your email
enter code