Loading...

Phone duration modeling for LVCSR using neural networks

Hadian, H ; Sharif University of Technology | 2017

290 Viewed
  1. Type of Document: Article
  2. DOI: 10.21437/Interspeech.2017-1686
  3. Publisher: International Speech Communication Association , 2017
  4. Abstract:
  5. We describe our work on incorporating probabilities of phone durations, learned by a neural net, into an ASR system. Phone durations are incorporated via lattice rescoring. The input features are derived from the phone identities of a context window of phones, plus the durations of preceding phones within that window. Unlike some previous work, our network outputs the probability of different durations (in frames) directly, up to a fixed limit. We evaluate this method on several large vocabulary tasks, and while we consistently see improvements inWord Error Rates, the improvements are smaller when the lattices are generated with neural net based acoustic models. Copyright © 2017 ISCA
  6. Keywords:
  7. Automatic speech recognition ; Neural networks ; Phone duration models ; Speech communication ; Speech recognition ; Telephone sets ; Acoustic model ; Context window ; Duration models ; Large vocabulary ; Lattice rescoring ; Phone duration modeling ; Reproducible results
  8. Source: 18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, 20 August 2017 through 24 August 2017 ; Volume 2017-August , 2017 , Pages 518-522 ; 2308457X (ISSN)
  9. URL: http://www.proceedings.com/36411.html