Loading...
HMM-based phrase-independent i-vector extractor for text-dependent speaker verification
Zeinali, H ; Sharif University of Technology | 2017
733
Viewed
- Type of Document: Article
- DOI: 10.1109/TASLP.2017.2694708
- Publisher: Institute of Electrical and Electronics Engineers Inc , 2017
- Abstract:
- The low-dimensional i-vector representation of speech segments is used in the state-of-the-art text-independent speaker verification systems. However, i-vectors were deemed unsuitable for the text-dependent task, where simpler and older speaker recognition approaches were found more effective. In this work, we propose a straightforward hidden Markov model (HMM) based extension of the i-vector approach, which allows i-vectors to be successfully applied to text-dependent speaker verification. In our approach, the Universal Background Model (UBM) for training phrase-independent i-vector extractor is based on a set of monophone HMMs instead of the standard Gaussian Mixture Model (GMM). To compensate for the channel variability, we propose to precondition i-vectors using a regularized variant of within-class covariance normalization, which can be robustly estimated in a phrase-dependent fashion on the small datasets available for the text-dependent task. The verification scores are cosine similarities between the i-vectors normalized using phrase-dependent s-norm. The experimental results on RSR2015 and RedDots databases confirm the effectiveness of the proposed approach, especially in rejecting test utterances with a wrong phrase. A simple MFCC based i-vector/HMM system performs competitively when compared to very computationally expensive DNN-based approaches or the conventional relevance MAP GMM-UBM, which does not allow for compact speaker representations. To our knowledge, this paper presents the best published results obtained with a single system on both RSR2015 and RedDots dataset. © 2014 IEEE
- Keywords:
- Bottleneck features ; DNN ; Hidden Markov model (HMM) ; I-vector ; Text-dependent speaker verification ; Character recognition ; Deep neural networks ; Gaussian distribution ; Hidden Markov models ; Markov processes ; Trellis codes ; Vectors ; Gaussian Mixture Model ; I vectors ; Speaker recognition ; Speaker verification ; Text-independent speaker verification ; Universal background model ; Within-class covariance ; Speech recognition
- Source: IEEE/ACM Transactions on Audio Speech and Language Processing ; Volume 25, Issue 7 , 2017 , Pages 1421-1435 ; 23299290 (ISSN)
- URL: https://ieeexplore.ieee.org/document/7902120