Loading...

End-to-end speech recognition using lattice-free MMI

Hadian, H ; Sharif University of Technology | 2018

566 Viewed
  1. Type of Document: Article
  2. DOI: 10.21437/Interspeech.2018-1423
  3. Publisher: International Speech Communication Association , 2018
  4. Abstract:
  5. We present our work on end-to-end training of acoustic models using the lattice-free maximum mutual information (LF-MMI) objective function in the context of hidden Markov models. By end-to-end training, we mean flat-start training of a single DNN in one stage without using any previously trained models, forced alignments, or building state-tying decision trees. We use full biphones to enable context-dependent modeling without trees, and show that our end-to-end LF-MMI approach can achieve comparable results to regular LF-MMI on well-known large vocabulary tasks. We also compare with other end-to-end methods such as CTC in character-based and lexicon-free settings and show 5 to 25 percent relative reduction in word error rates on different large vocabulary tasks while using significantly smaller models. © 2018 International Speech Communication Association. All rights reserved
  6. Keywords:
  7. Automatic speech recognition ; End-to-end ; Flat-start ; Hidden Markov model ; Lattice-free MMI ; Decision trees ; Forestry ; Hidden Markov models ; Speech communication ; Stereophonic broadcasting ; Trellis codes ; Vocabulary control ; Automatic speech recognition ; Context dependent modeling ; End to end ; Large vocabulary ; Lattice-free ; Maximum mutual information ; Objective functions ; Relative reduction ; Speech recognition
  8. Source: 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018, 2 September 2018 through 6 September 2018 ; Volume 2018-September , 2018 , Pages 12-16 ; 2308457X (ISSN)
  9. URL: https://www.isca-speech.org/archive/Interspeech_2018/abstracts/1423.html