Loading...

Persian Speech Emotion Classification

Panahi, Shima | 2021

454 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 54008 (05)
  4. University: Sharif University of Technology
  5. Department: Electrical Engineering
  6. Advisor(s): Gholampour, Iman; Movahedian, Hamid
  7. Abstract:
  8. Emotion recognition from speech signals has become one of the most popular researches in recent years. In order to increase human-machine interaction, a proper connection must be established between them. To achieve this goal, a machine must be able to understand the situation and respond accordingly. Part of this process involves understanding the user's emotional state. In recent years, various methods have been proposed to increase the efficiency of the speech emotion recognition system. These methods include collecting various audio databases, extracting efficient features from speech signals, using feature selection algorithms, designing different classifiers, as well as combining classifiers. The main theme of this project is the recognition of emotions through speech on a Sharif emotional Persian database (ShEMO) and their classification using machine learning, especially deep learning methods. In order to check the generalizability of the models and compare the results, the created system has also been tested on the RAVDESS database, which is an English database. In the SER system, after performing the necessary pre-processing on audio files, various time and frequency features are extracted from them. Then, to select the best and most effective features, the Feature Recursive Elimination (RFE) algorithm has been used. These features are then classified by a classifier into five components of emotion (sadness, happiness, surprise, anger, and neutral). For the classification section, various methods such as SVM classification and deep neural network (convolution neural network) have been evaluated and compared. To increase the accuracy of classification, the Ensemble learning method has been used. Also, since one of the main goals of this project is to create an emotion analysis system for use in a mobile smartphone center (Sharif Connect system), Therefore in another experiment, we converted the desired Persian emotional database into a telephone database using an audio analysis software (Adobe Audition). Then, to improve the classification accuracy of telephone samples with convolutional networks, we have used the Data Augmentation method. Finally, the dependency of the created SER system on the language of the speakers, the effect of Gender-Dependent emotion recognition, and the evaluation of the system's performance in practice have also been examined
  9. Keywords:
  10. Classification ; Support Vector Machine (SVM) ; Deep Learning ; Feature Extraction ; Ensemble Learning ; Speech Emotion Recognition

 Digital Object List

 Bookmark

No TOC