Speech Activity Detection Using Deep Networks

Shahsavari, Sajad; Sameti, Hossein

Speech Activity Detection Using Deep Networks

, M.Sc. Thesis Sharif University of Technology Shahsavari, Sajad (Author) ; Sameti, Hossein (Supervisor)

Abstract

In this paper, we introduce a new dataset for SAD and evaluate certain common methods such as GMM, ANN, and RNN on it. We have collected our dataset in a semi-supervised approach, using subtitled movies, with a labeling accuracy of 95%. This semi-automatic method can help us collect huge amounts of labeled audio data with very high diversity in language, speaker, and channel. We model the problem of SAD as a classification task to two classes of speech and non-speech. When using GMM for this problem, we use two separate mixtures to model speech and non-speech. In the case of neural networks, we use a softmax layer at the end of the network, with two neurons which represent speech and...

Speech Activity Detection Using Deep Networks

Cataloging brief

Speech Activity Detection Using Deep Networks

Find in content

Bookmark