Loading...
Telephony Text-Independent Speaker Verification in Total Variability Space
Mirian, Alireza | 2014
563
Viewed
- Type of Document: M.Sc. Thesis
- Language: Farsi
- Document No: 46319 (19)
- University: Sharif University of Technology
- Department: Computer Engineering
- Advisor(s): Sameti, Hossein
- Abstract:
- Given two speech segments, the task of speaker verification is defined as determining whether or not both of them have been uttered by the same person. Most of the new approaches in speaker verification are based on Total Variability Space which is the result of applying a factor analysis on GMM mean supervector space. The representation of speech with arbitrary duration in this space is called i-vector.
In this thesis, first the basics of speaker verification is described and i-vector approaches are explained in more details. Then, a method for improving accuracy of Cosine Similarity Scoring is proposed which normalize the raw score using the score of test utterance against a model- and test- dependent cohort set. Obtained results show 18.5% and 10% improvement of EER and also 19.5% and 15% improvement of MinDCF, over CSS baseline system and the second best score-domain compensation method, respectively. Furthermore, fusion of these normalized scores with PLDA scores improves accuracy of PLDA.
At the next part of this thesis, the issue of drastic accuracy reduction in the case of limited data and duration mismatch is discussed and the effect of phonetic content mismatch on this accuracy degradation is explained. Then, a method is proposed for increasing phonetic match in the case of mismatch between train and test duration, which is a usual condition in real world applications. Based on the fact that UBM Gaussians model different acoustic classes, this method tries to select only a subset of train data (the longer) frames that their most probable generating Gaussians are among the ones that are most probable to produce test data. This way, the problem of phonetic content mismatch is avoided explicitly. Another method is proposed in which the longer segment is divided into segments with size of smaller one. Then, the final i-vector is generated with combining i-vectors extracted from this smaller segments. Results show that for tests of 2s and 5s duration, the proposed method results in 10% accuracy improvement - Keywords:
- Text-Independent Speaker Verification ; Identity Vector (I-Vector) ; Cosine Similarity Scoring ; Duration Missmatch ; Phonetic-content Mismatch
- محتواي کتاب
- view