Speech Activity Detection Using Deep Networks, M.Sc. Thesis Sharif University of Technology ; Sameti, Hossein (Supervisor)
Abstract
In this paper, we introduce a new dataset for SAD and evaluate certain common methods such as GMM, ANN, and RNN on it. We have collected our dataset in a semi-supervised approach, using subtitled movies, with a labeling accuracy of 95%. This semi-automatic method can help us collect huge amounts of labeled audio data with very high diversity in language, speaker, and channel. We model the problem of SAD as a classification task to two classes of speech and non-speech. When using GMM for this problem, we use two separate mixtures to model speech and non-speech. In the case of neural networks, we use a softmax layer at the end of the network, with two neurons which represent speech and...
Cataloging briefSpeech Activity Detection Using Deep Networks, M.Sc. Thesis Sharif University of Technology ; Sameti, Hossein (Supervisor)
Abstract
In this paper, we introduce a new dataset for SAD and evaluate certain common methods such as GMM, ANN, and RNN on it. We have collected our dataset in a semi-supervised approach, using subtitled movies, with a labeling accuracy of 95%. This semi-automatic method can help us collect huge amounts of labeled audio data with very high diversity in language, speaker, and channel. We model the problem of SAD as a classification task to two classes of speech and non-speech. When using GMM for this problem, we use two separate mixtures to model speech and non-speech. In the case of neural networks, we use a softmax layer at the end of the network, with two neurons which represent speech and...
Find in contentBookmark |
|