Sharif Digital Repository / Sharif University of Technology / Search result

ShEMO: a large-scale validated database for persian speech emotion detection

, Article Language Resources and Evaluation ; 2018 ; 1574020X (ISSN) Nezami, O. M ; Jamshid Lou, P ; Karami, M ; Sharif University of Technology

Springer Netherlands 2018

Abstract

This paper introduces a large-scale, validated database for Persian called Sharif Emotional Speech Database (ShEMO). The database includes 3000 semi-natural utterances, equivalent to 3 h and 25 min of speech data extracted from online radio plays. The ShEMO covers speech samples of 87 native-Persian speakers for five basic emotions including anger, fear, happiness, sadness and surprise, as well as neutral state. Twelve annotators label the underlying emotional state of utterances and majority voting is used to decide on the final labels. According to the kappa measure, the inter-annotator agreement is 64% which is interpreted as “substantial agreement”. We also present benchmark results...

ShEMO: a large-scale validated database for Persian speech emotion detection

, Article Language Resources and Evaluation ; Volume 53, Issue 1 , 2019 ; 1574020X (ISSN) Mohamad Nezami, O ; Jamshid Lou, P ; Karami, M ; Sharif University of Technology

Springer Netherlands 2019

Abstract

This paper introduces a large-scale, validated database for Persian called Sharif Emotional Speech Database (ShEMO). The database includes 3000 semi-natural utterances, equivalent to 3 h and 25 min of speech data extracted from online radio plays. The ShEMO covers speech samples of 87 native-Persian speakers for five basic emotions including anger, fear, happiness, sadness and surprise, as well as neutral state. Twelve annotators label the underlying emotional state of utterances and majority voting is used to decide on the final labels. According to the kappa measure, the inter-annotator agreement is 64% which is interpreted as “substantial agreement”. We also present benchmark results...

Noise reduction algorithm for robust speech recognition using MLP neural network

, Article PACIIA 2009 - 2009 2nd Asia-Pacific Conference on Computational Intelligence and Industrial Applications, 28 November 2009 through 29 November 2009 ; Volume 1 , 2009 , Pages 377-380 ; 9781424446070 (ISBN) Ghaemmaghami, M. P ; Razzazi, F ; Sameti, H ; Dabbaghchian, S ; BabaAli, B ; Sharif University of Technology

Abstract

We propose an efficient and effective nonlinear feature domain noise suppression algorithm, motivated by the minimum mean square error (MMSE) optimization criterion. Multi Layer Perceptron (MLP) neural network in the log spectral domain minimizes the difference between noisy and clean speech. By using this method as a pre-processing stage of a speech recognition system, the recognition rate in noisy environments is improved. We can extend the application of the system to different environments with different noises without re-training it. We need only to train the preprocessing stage with a small portion ofnoisy data which is created by artificially adding different types of noises from the...

Robust speech recognition using MLP neural network in log-spectral domain

, Article IEEE International Symposium on Signal Processing and Information Technology, ISSPIT 2009, 14 December 2009 through 16 December 2009, Ajman ; 2009 , Pages 467-472 ; 9781424459506 (ISBN) Ghaemmaghami, M. P ; Sametit, H ; Razzazi, F ; BabaAli, B ; Dabbaghchiarr, S ; Sharif University of Technology

Abstract

In this paper, we have proposed an efficient and effective nonlinear feature domain noise suppression algorithm, motivated by the minimum mean square error (MMSE) optimization criterion. A Multi Layer Perceptron (MLP) neural network in the log spectral domain has been employed to minimize the difference between noisy and clean speech. By using this method, as a pre-processing stage of a speech recognition system, the recognition rate in noisy environments has been improved. We extended the application ofthe system to different environments with different noises without retraining HMMmodel. We trained the feature extraction stage with a small portion of noisy data which was created by...

Introducing a framework to create telephony speech databases from direct ones

, Article 14th International Conference on Systems Signals and Image Processing, IWSSIP 2007 and 6th EURASIP Conference Focused on Speech and Image Processing, Multimedia Communications and Services, EC-SIPMCS 2007, Maribor, 27 June 2007 through 30 June 2007 ; November , 2007 , Pages 327-330 ; 9789612480295 (ISBN) Momtazi, S ; Sameti, H ; Vaisipour, S ; Tefagh, M ; Sharif University of Technology

2007

Abstract

A Comprehensive speech database is one of the important tools for developing speech recognition systems; these tools are necessary for telephony recognition, too. Although adequate databases for direct speech recognizers exist, there is not an appropriate database for telephony speech recognizers. Most methods suggested for solving this problem are based on building new databases which tends to consume much time and many resources; or they used a filter which simulates circuit switch behavior to transform direct databases to telephony ones, in this case resulted databases have many differences with real telephony databases. In this paper we introduce a framework for creating telephony speech...