Loading...

Improving Robustness of Speaker Verification Systems Against Non-Identity Information

Zeinali, Hossein | 2017

1030 Viewed
  1. Type of Document: Ph.D. Dissertation
  2. Language: Farsi
  3. Document No: 50277 (19)
  4. University: Sharif University of Technology
  5. Department: Computer Engineering
  6. Advisor(s): Sameti, Hossein
  7. Abstract:
  8. Speaker verification as a kind of biometric methods aims to verify the identity of a person from characteristics of their voice. This method faces many challenges such as voice imitation (spoofing), use of recorded voice, high sensitivity to convolutive distortions resulted by channel, and a large performance degradation for short-duration utterances. The aim of this thesis is to propose different methods for reducing the effects of non-identity information,especially the channel, and also solving the problem of new methods for text-dependent speaker verification with very short utterances. i-vector has been the best speaker modeling method in recent years but it doesn’t result in good performance in text-dependent mode. On the other hand, the best method for reducing channel effects is probabilistic linear discriminant analysis while it cannot be used for short duration scenarios, especially in text-dependent applications. Experiments show that the i-vector contains high non-identity information that affects its performance and the effects of this information should be reduced to achieve the best performance.In order to improve the low performance of using i-vector in text-dependent speaker verification,the hidden Markov model is suggested to be used in such a way to be able to train an i-ector extractor in a phrase-independent manner. To reduce the effects of non-identity information, the regularized methods are proposed along with the phrase-dependent score normalization, which has obtained the best results for the text-dependent speaker verification using i-vector. Next, the use of a deep neural network is proposed to improve the performance of the hidden Markov model, as well as improving the i-vector performance obtained from the Gaussian mixture model. For this purpose, a two-level bottleneck neural network with large overlapping input features is used. The extracted bottleneck features from this network, along with the resulting frame alignment, resulted considerable improvements in almost all experiments. The final system based on the proposed methods is shown to have the bestreported performance on both evaluation databases which achieved more than 50 percent relative error reduction on the main database. For the text-independent mode, a new method is proposed to reduce non-identity information and resulted in performance improvement.Furthermore, two new methods for imposter set selection are proposed based on this method and are shown to be more efficient than existing ones. Finally, another method is proposed to reduce the effect of the language mismatch in the training data using a nuisance attribute projection, the combination of which with other proposed methods yielded acceptable results for the NIST speaker recognition evaluation 2016 compared to other participants
  9. Keywords:
  10. Speaker Verification ; Hidden Markov Model ; Deep Neural Networks ; Identity Vector (I-Vector) ; Regularization ; Bottleneck ; Non-Identity Information

 Digital Object List

 Bookmark

...see more