Improving Robustness of Speaker Verification Systems Against Non-Identity Information

Zeinali, Hossein; Sameti, Hossein

Please enable javascript in your browser.

Improving Robustness of Speaker Verification Systems Against Non-Identity Information

Zeinali, Hossein | 2017

1030 Viewed

Type of Document: Ph.D. Dissertation
Language: Farsi
Document No: 50277 (19)
University: Sharif University of Technology
Department: Computer Engineering
Advisor(s): Sameti, Hossein
Abstract:
Speaker verification as a kind of biometric methods aims to verify the identity of a person from characteristics of their voice. This method faces many challenges such as voice imitation (spoofing), use of recorded voice, high sensitivity to convolutive distortions resulted by channel, and a large performance degradation for short-duration utterances. The aim of this thesis is to propose different methods for reducing the effects of non-identity information,especially the channel, and also solving the problem of new methods for text-dependent speaker verification with very short utterances. i-vector has been the best speaker modeling method in recent years but it doesn’t result in good performance in text-dependent mode. On the other hand, the best method for reducing channel effects is probabilistic linear discriminant analysis while it cannot be used for short duration scenarios, especially in text-dependent applications. Experiments show that the i-vector contains high non-identity information that affects its performance and the effects of this information should be reduced to achieve the best performance.In order to improve the low performance of using i-vector in text-dependent speaker verification,the hidden Markov model is suggested to be used in such a way to be able to train an i-ector extractor in a phrase-independent manner. To reduce the effects of non-identity information, the regularized methods are proposed along with the phrase-dependent score normalization, which has obtained the best results for the text-dependent speaker verification using i-vector. Next, the use of a deep neural network is proposed to improve the performance of the hidden Markov model, as well as improving the i-vector performance obtained from the Gaussian mixture model. For this purpose, a two-level bottleneck neural network with large overlapping input features is used. The extracted bottleneck features from this network, along with the resulting frame alignment, resulted considerable improvements in almost all experiments. The final system based on the proposed methods is shown to have the bestreported performance on both evaluation databases which achieved more than 50 percent relative error reduction on the main database. For the text-independent mode, a new method is proposed to reduce non-identity information and resulted in performance improvement.Furthermore, two new methods for imposter set selection are proposed based on this method and are shown to be more efficient than existing ones. Finally, another method is proposed to reduce the effect of the language mismatch in the training data using a nuisance attribute projection, the combination of which with other proposed methods yielded acceptable results for the NIST speaker recognition evaluation 2016 compared to other participants
Keywords:
Speaker Verification ; Hidden Markov Model ; Deep Neural Networks ; Identity Vector (I-Vector) ; Regularization ; Bottleneck ; Non-Identity Information

Digital Object List

محتواي کتاب
view

Bookmark

فهرست تصاویر
فهرست جداول
فهرست علائم اختصاری
فهرست نمادها
پیش‌گفتار
- روش‌های مختلف زیست‌سنجی
- تصدیق هویت گوینده
- تاریخچه
- اطلاعات غیرهویتی
- کاربردهای بازشناسی گوینده
- اهداف و دستاوردهای این رساله
- ساختار رساله
مروری بر تئوری‌های مرتبط
- مقدمه
- تصدیق هویت گوینده
  - دسته‌بندی‌های بازشناسی گوینده
  - مراحل یک سامانهٔ تصدیق هویت گوینده
  - اجزای سامانهٔ تصدیق هویت گوینده
- روش‌های مختلف مدل‌سازی گوینده
  - روش‌های مبتنی بر مدل مخلوط گاوسی و مدل پس‌زمینهٔ جهانی
  - روش‌های مبتنی بر ابربردار میانگین و ماشین بردار پشتیبان
  - روش‌های مبتنی بر تحلیل عامل توأم
  - روش مبتنی بر بردار هویت در فضای تغییرات کل
- روش‌های مختلف کاهش تأثیرات کانال
  - یکسان‌ساز کواریانس درون کلاسی
  - نگاشت مشخصهٔ مزاحم
  - تحلیل تفکیک‌کنندهٔ خطی
  - تحلیل تفکیک‌کنندهٔ خطی احتمالاتی
- بدست آوردن امتیاز در روش‌های بر مبنای بردار هویت
  - امتیازدهی فاصلهٔ کسینوسی
  - امتیازدهی در روش PLDA
- هنجارسازی امتیاز
  - هنجارسازی صفر
  - هنجارسازی آزمون
  - هنجارسازی آزمون وابسته به صفر
  - هنجارسازی متقارن
- انتخاب مجموعهٔ شیادان
  - انتخاب مجموعهٔ شیادان به روش برون‌خط
  - انتخاب مجموعهٔ شیادان به روش برخط
- معیارهای ارزیابی
- انواع آزمون در حالت وابسته به متن
- جمع‌بندی
مروری بر کارهای پیشین
- مقدمه
- تصدیق هویت گوینده در حالت وابسته به متن
- تصدیق هویت گوینده در حالت مستقل از متن
- استفاده از شبکه عصبی عمیق در تصدیق هویت گوینده
- جمع‌بندی
استفاده از مدل مخفی مارکوف
- مقدمه
- روش ارائه شده
  - استفاده از مدل مخفی مارکوف
  - استفاده از تحلیل تفکیک‌کنندهٔ خطی تنظیم‌شده
  - استفاده از یکسان‌ساز کواریانس درون کلاسی تنظیم‌شده
  - هم‌ترازی مدل مخفی مارکوف در زمان آزمون
- مقدمات آزمایش‌ها
  - دادگان
  - ویژگی‌ها
  - پارامترهای مدل‌ها
- آزمایش‌ها و نتایج
  - مقایسهٔ ویژگی‌ها
  - مقایسهٔ روش‌های هم‌ترازی مدل مخلوط گاوسی و مدل مخفی مارکوف
  - تأثیر بعد بردار هویت بر کارایی روش
  - مقایسهٔ روش‌های تنظیم‌شده با روش‌های مرسوم برای کاهش تأثیرات کانال
  - مقایسه با روش‌های دیگر
  - تأثیر استفاده از دادگان آموزشی دیگر
  - مقایسه با روش‌های مرسوم روی دادگان رِدداتس
- نتیجه‌گیری
استفاده از شبکه عصبی عمیق
- مقدمه
- توضیحات روش
  - شبکه عصبی گلوگاه دوتایی
  - ویژگی گلوگاه
  - روش‌های مختلف هم‌ترازی بردارهای ویژگی
- مقدمات آزمایش‌ها
  - دادگان
  - ویژگی‌ها
  - پارامتر مدل‌ها
- آزمایش‌ها و نتایج
  - مقایسهٔ چهار روش مختلف هم‌ترازی
  - مقایسهٔ شبکه‌های ۸ کیلوهرتزی با شبکه‌های ۱۶ کیلوهرتزی
  - تأثیر تعداد خروجی بر کارایی شبکه‌های عصبی
  - نتایج ترکیبی روش‌های مختلف
  - مقایسهٔ روش‌های مختلف بعد از حذف گفتارهای مشکل‌دار
  - مقایسهٔ سرعت و حافظهٔ مورد نیاز روش‌های مختلف
  - مقایسهٔ نتایج بدست آمده در مسابقهٔ رِدداتس با دیگر شرکت‌کننده‌ها
- جمع‌بندی
بهبود سامانهٔ مستقل از متن
- مقدمه
- کاهش تأثیرات اطلاعات غیرهویتی در فاصلهٔ کسینوسی
  - انگیزه
  - کاهش تأثیر اطلاعات غیرهویتی
  - شرایط آزمایش‌ها
  - نتایج
- انتخاب مجموعهٔ شیادان
  - انتخاب مجموعهٔ شیادان به روش خاص آزمون
  - انتخاب مجموعهٔ شیادان به روش ترکیبی
  - شرایط آزمایش‌ها
  - نتایج
- کاهش تأثیر عدم تطبیق زبان
  - کاهش تأثیر زبان با استفاده از نگاشت مشخصهٔ مزاحم
  - شرایط آزمایش‌ها
  - نتایج
- جمع‌بندی
خلاصه، نتیجه‌گیری و کارهای پیش‌رو
- خلاصه
- نتیجه‌گیری
- دستاوردها و نوآوری‌های ارائه شده
- کارهای پیش‌رو
مراجع
واژه‌نامهٔ فارسی به انگلیسی
واژه‌نامهٔ انگلیسی به فارسی
آموزش پارامترها
- مراحل آموزش استخراج‌گر بردار هویت
- مراحل آموزش پارامترهای مدل PLDA
سایر پژوهش‌های انجام شده
- تصدیق هویت گوینده به صورت متن تصادفی
  - روش ارائه شده
  - مقدمات آزمایشات
  - نتایج
  - نتیجه‌گیری
- تصدیق عبارت گفتاری با استفاده از بردار هویت
  - تصدیق عبارت گفتاری
  - روش ارائه شده
  - مقدمات آزمایشات
  - نتایج
  - نتیجه‌گیری

Friend's email
Your name
Your email
enter code