Detecting Speakers in a Telephone Conversation

Soltani Farani, Ali; Sameti, Hossein

Please enable javascript in your browser.

Detecting Speakers in a Telephone Conversation

Soltani Farani, Ali | 2010

655 Viewed

Type of Document: M.Sc. Thesis
Language: Farsi
Document No: 41507 (19)
University: Sharif University of Technology
Department: Computer Engineering
Advisor(s): Sameti, Hossein
Abstract:
The human speech signal conveys many levels of information ranging from phonetic content to speaker identity and even emotional status. This thesis deals with the task of open-set speaker identification (SI) from an unconstrained telephone conversation between two speakers. The goal is to find at most two speakers among a known set of target speakers that best match the voice samples of the input speech; the input voice samples are not constrained to the target speaker set. The uni-speaker problem is investigated first. The classic GMM-UBM system for text-independent SI and its adapted form are explored. The use of score-space information is advocated as a complementary source to the information extracted from maximum-likelihood analysis. The Score-Space Pattern Recognizer (SSPR) is introduced to this end. Each point in the score-space is a vector of scores assigned by the target speaker models to the input sample. Our experiments on the Farsdat database show that use of our SSPR as a complementary source can on average reduce the EER of open-set SI by 22.6% with an increase of 22% in training time. Next the multi-speaker problem is addressed. Open-set SI from a telephone conversation is a new problem. We explore its differences to the conventional speaker detection problem which is defined in the speaker verification sense. Speaker clustering in the score-space is proposed prior to SI. Two methods are developed. The first method clusters equal-length segments of speech while the second method uses a simple speaker segmentation algorithm based on BIC. In Comparison with the minimum EER of 16.3% on the Farsdat database for the system where the segmentation and clustering is known a priori, our second method gives an EER of 18.9%
Keywords:
Speaker Identification ; Text-Independent ; Score-Space Information ; Speaker Segmentation ; Speaker Clustering

Digital Object List

محتواي پايان نامه
view

Bookmark

No TOC

Friend's email
Your name
Your email
enter code