Sharif Digital Repository / Sharif University of Technology / Search result

Flat-Start single-stage discriminatively trained hmm-based models for asr

, Article IEEE/ACM Transactions on Audio Speech and Language Processing ; Volume 26, Issue 11 , 2018 , Pages 1949-1961 ; 23299290 (ISSN) Hadian, H ; Sameti, H ; Povey, D ; Khudanpur, S ; Sharif University of Technology

Institute of Electrical and Electronics Engineers Inc 2018

Abstract

In recent years, end-to-end approaches to automatic speech recognition have received considerable attention as they are much faster in terms of preparing resources. However, conventional multistage approaches, which rely on a pipeline of training hidden Markov models (HMM)-GMM models and tree-building steps still give the state-of-the-art results on most databases. In this study, we investigate flat-start one-stage training of neural networks using lattice-free maximum mutual information (LF-MMI) objective function with HMM for large vocabulary continuous speech recognition. We thoroughly look into different issues that arise in such a setup and propose a standalone system, which achieves...

Improving LF-MMI using unconstrained supervisions for ASR

, Article 2018 IEEE Spoken Language Technology Workshop, SLT 2018, 18 December 2018 through 21 December 2018 ; 2019 , Pages 43-47 ; 9781538643341 (ISBN) Hadian, H ; Povey, D ; Sameti, H ; Trmal, J ; Khudanpur, S ; Sharif University of Technology

Institute of Electrical and Electronics Engineers Inc 2019

Abstract

We present our work on improving the numerator graph for discriminative training using the lattice-free maximum mutual information (MMI) criterion. Specifically, we propose a scheme for creating unconstrained numerator graphs by removing time constraints from the baseline numerator graphs. This leads to much smaller graphs and therefore faster preparation of training supervisions. By testing the proposed un-constrained supervisions using factorized time-delay neural network (TDNN) models, we observe 0.5% to 2.6% relative improvement over the state-of-the-art word error rates on various large-vocabulary speech recognition databases. © 2018 IEEE

End-to-end speech recognition using lattice-free MMI

, Article 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018, 2 September 2018 through 6 September 2018 ; Volume 2018-September , 2018 , Pages 12-16 ; 2308457X (ISSN) Hadian, H ; Sameti, H ; Povey, D ; Khudanpur, S ; Sharif University of Technology

International Speech Communication Association 2018

Abstract

We present our work on end-to-end training of acoustic models using the lattice-free maximum mutual information (LF-MMI) objective function in the context of hidden Markov models. By end-to-end training, we mean flat-start training of a single DNN in one stage without using any previously trained models, forced alignments, or building state-tying decision trees. We use full biphones to enable context-dependent modeling without trees, and show that our end-to-end LF-MMI approach can achieve comparable results to regular LF-MMI on well-known large vocabulary tasks. We also compare with other end-to-end methods such as CTC in character-based and lexicon-free settings and show 5 to 25 percent...