Loading...

Speech Emotion Recognition Using Deep Learning and Frequency Features

Aftab, Arya | 2021

310 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 54632 (05)
  4. University: Sharif University of Technology
  5. Department: Electrical Engineering
  6. Advisor(s): Ghaemmaghami, Shahrokh
  7. Abstract:
  8. Speech is the most natural and widely used approach to communication between people and the fastest communication between humans and computers. Advances have been made in this area to achieve complete success in the natural human-computer relationship. The big challenge in this way is the inability of the computer to recognize the user's feelings; Therefore, in speech processing, one of the things that should be studied and considered is; detection of emotion from a speech by a computer. This is because Emotion recognition of speech can help extract meanings and improve the functioning of the speech recognition system. This study first defined the emotions and materials needed to build our desired model and then presented an efficient, lightweight model to identify speech emotions with an approach based on convolutional neural networks. The proposed model has low computational complexity and can be implemented on devices with low processing power, such as mobile phones and microcontrollers. The proposed method consists of three separate parts. The first part is preparing the MFCC features for input to the model, the second part is the extraction of high-level features with sufficient separability, and the third section classifies the input data using these extracted features.We evaluated the proposed model in terms of storage, amount of processing required for an input, maximum memory required, ability to reduce the precision of weights, recognition rate, dependence on input length, and different loss functions. The evaluations show that the limitations of the target systems (microcontrollers) are maintained by the proposed model. The accuracy of the proposed model on EMODB, IEMOCAP, SAVEE, RAVDESS, EMOVO, TESS, ShEMO and URDU databases is 92.19, 70.23, 84.58, 91.63, 82.88, 100, 77.63 and 95.87, respectively; Which shows the very good performance of the model compared to humans and the random baseline
  9. Keywords:
  10. Speech Emotion Recognition ; Deep Learning ; Frequency Features ; Lightweight Model ; Human Computer Interaction (HCI)

 Digital Object List

 Bookmark

No TOC