Loading...

Detecting Speakers in a Telephone Conversation

Soltani Farani, Ali | 2010

655 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 41507 (19)
  4. University: Sharif University of Technology
  5. Department: Computer Engineering
  6. Advisor(s): Sameti, Hossein
  7. Abstract:
  8. The human speech signal conveys many levels of information ranging from phonetic content to speaker identity and even emotional status. This thesis deals with the task of open-set speaker identification (SI) from an unconstrained telephone conversation between two speakers. The goal is to find at most two speakers among a known set of target speakers that best match the voice samples of the input speech; the input voice samples are not constrained to the target speaker set. The uni-speaker problem is investigated first. The classic GMM-UBM system for text-independent SI and its adapted form are explored. The use of score-space information is advocated as a complementary source to the information extracted from maximum-likelihood analysis. The Score-Space Pattern Recognizer (SSPR) is introduced to this end. Each point in the score-space is a vector of scores assigned by the target speaker models to the input sample. Our experiments on the Farsdat database show that use of our SSPR as a complementary source can on average reduce the EER of open-set SI by 22.6% with an increase of 22% in training time. Next the multi-speaker problem is addressed. Open-set SI from a telephone conversation is a new problem. We explore its differences to the conventional speaker detection problem which is defined in the speaker verification sense. Speaker clustering in the score-space is proposed prior to SI. Two methods are developed. The first method clusters equal-length segments of speech while the second method uses a simple speaker segmentation algorithm based on BIC. In Comparison with the minimum EER of 16.3% on the Farsdat database for the system where the segmentation and clustering is known a priori, our second method gives an EER of 18.9%
  9. Keywords:
  10. Speaker Identification ; Text-Independent ; Score-Space Information ; Speaker Segmentation ; Speaker Clustering

 Digital Object List

  • محتواي پايان نامه
  •   view

 Bookmark

No TOC