Loading...

Acoustic modeling from frequency-domain representations of speech

Ghahremani, P ; Sharif University of Technology | 2018

785 Viewed
  1. Type of Document: Article
  2. DOI: 10.21437/Interspeech.2018-1453
  3. Publisher: International Speech Communication Association , 2018
  4. Abstract:
  5. In recent years, different studies have proposed new methods for DNN-based feature extraction and joint acoustic model training and feature learning from raw waveform for large vocabulary speech recognition. However, conventional pre-processed methods such as MFCC and PLP are still preferred in the state-of-the-art speech recognition systems as they are perceived to be more robust. Besides, the raw waveform methods - most of which are based on the time-domain signal - do not significantly outperform the conventional methods. In this paper, we propose a frequency-domain feature-learning layer which can allow acoustic model training directly from the waveform. The main distinctions from previous works are a new normalization block and a short-range constraint on the filter weights. The proposed setup achieves consistent performance improvements compared to the baseline MFCC and log-Mel features as well as other proposed time and frequency domain setups on different LVCSR tasks. Finally, based on the learned filters in our feature-learning layer, we propose a new set of analytic filters using polynomial approximation, which outperforms log-Mel filters significantly while being equally fast. © 2018 International Speech Communication Association. All rights reserved
  6. Keywords:
  7. Acoustic modeling ; Filter bank learning ; Frequency domain analysis ; Polynomial approximation ; Speech ; Speech communication ; Time domain analysis ; Acoustic model ; Acoustic model trainings ; Consistent performance ; Conventional methods ; Frequency-domain representations ; Large vocabulary speech recognition ; Speech recognition systems ; Time and frequency domains ; Speech recognition
  8. Source: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2 September 2018 through 6 September 2018 ; Volume 2018-September , 2018 , Pages 1596-1600 ; 2308457X (ISSN)
  9. URL: https://www.isca-speech.org/archive/Interspeech_2018/abstracts/1453.html