Sharif Digital Repository / Sharif University of Technology / Search result

ShEMO: a large-scale validated database for persian speech emotion detection

, Article Language Resources and Evaluation ; 2018 ; 1574020X (ISSN) Nezami, O. M ; Jamshid Lou, P ; Karami, M ; Sharif University of Technology

Springer Netherlands 2018

Abstract

This paper introduces a large-scale, validated database for Persian called Sharif Emotional Speech Database (ShEMO). The database includes 3000 semi-natural utterances, equivalent to 3 h and 25 min of speech data extracted from online radio plays. The ShEMO covers speech samples of 87 native-Persian speakers for five basic emotions including anger, fear, happiness, sadness and surprise, as well as neutral state. Twelve annotators label the underlying emotional state of utterances and majority voting is used to decide on the final labels. According to the kappa measure, the inter-annotator agreement is 64% which is interpreted as “substantial agreement”. We also present benchmark results...

ShEMO: a large-scale validated database for Persian speech emotion detection

, Article Language Resources and Evaluation ; Volume 53, Issue 1 , 2019 ; 1574020X (ISSN) Mohamad Nezami, O ; Jamshid Lou, P ; Karami, M ; Sharif University of Technology

Springer Netherlands 2019

Abstract

This paper introduces a large-scale, validated database for Persian called Sharif Emotional Speech Database (ShEMO). The database includes 3000 semi-natural utterances, equivalent to 3 h and 25 min of speech data extracted from online radio plays. The ShEMO covers speech samples of 87 native-Persian speakers for five basic emotions including anger, fear, happiness, sadness and surprise, as well as neutral state. Twelve annotators label the underlying emotional state of utterances and majority voting is used to decide on the final labels. According to the kappa measure, the inter-annotator agreement is 64% which is interpreted as “substantial agreement”. We also present benchmark results...

Speaker Adaptation in HMM-Based Persian Speech Synthesis

, M.Sc. Thesis Sharif University of Technology Bahmaninezhad, Fahimeh (Author) ; Sameti, Hossein (Supervisor)

Abstract

Text-to-speech synthesis, one of the key technologies in speech processing, is a technique for generating speech signal from arbitrarily given text with target speaker’s voice characteristics and various speaking styles and emotional expressions. Statistical parametric speech synthesishasrecently been shown to be very effective in generating acceptable synthesized speech. Therefore, in this study,the main focus is on one of the instances of these techniquescalled hidden Markov model-based speech synthesis. In text-to-speech systems, it is desirable to synthesize high quality speech using a small amount of speech data; this goal would be achieved by employing speaker adaptation framework and...

محتواي پايان نامه

Using Structural Language Modeling in Continous Speech Recognition Systems

, M.Sc. Thesis Sharif University of Technology SheikhShab, Golnar (Author) ; Sameti, Hossein (Supervisor)

Abstract

Language model is one of the most important parsts of an automated speech recognition system whiche incorporates the knowledge of Natural Language to the system to improve its accuracy. Conventionally used language model in recognition systems is ngram which usually is extracted from a large corpus using related frequency method. ngram model approximates the probability of a word sequence by multiplying its ngram probabilities and thus does not take into account the long distance dependencies. So, syntactic language models could be of interest. In this research after probing different syntactic language models, a mehtod for re-estimating ngram model, introduced by Stolcke in 1994, was...

محتواي پايان نامه

Implementation and evaluation of statistical parametric speech synthesis methods for the Persian language

, Article IEEE International Workshop on Machine Learning for Signal Processing, 18 September 2011 through 21 September 2011 ; September , 2011 , Page(s): 1 - 6 ; 9781457716232 (ISBN) Bahaadini, S ; Sameti, H ; Khorram, S ; Sharif University of Technology

2011

Abstract

Scattered and little research in the field of Persian speech synthesis systems has been performed during the last ten years. Comprehensive framework that properly implements and adapts statistical speech synthesis methods for Persian has not been conducted yet. In this paper, recent statistical parametric speech synthesis methods including CLUSTERGEN, traditional HMM-based speech synthesis and its STRAIGHT version, are implemented and adapted for Persian language. CCR test is carried out to compare these methods with each other and with unit selection method. Listeners Score samples based on CMOS. The methods were ranked by averaging the CCR scores. The results show that STRAIGHT-based...

Niusha, the first persian speech-enabled IVR platform

, Article 2010 5th International Symposium on Telecommunications, IST 2010, 4 December 2010 through 6 December 2010, Tehran ; 2010 , Pages 591-595 ; 9781424481835 (ISBN) Bokaei, M. H ; Sameti, H ; Eghbal-Zadeh, H ; BabaAli, B ; Hosseinzadeh, K. H ; Bahrani, M ; Veisi, H ; Sanian, A ; Sharif University of Technology

2010

Abstract

This paper introduces Niusha, the first Persian speech-enabled IVR platform. This platform uses Persian recognizer and Persian text-to-speech synthesizer engines in order to interact with users. The platform is designed in a way that it can simply be customized in various domains and its components are adjustable with new words

Improving the performance of speech recognition systems using fault-tolerant techniques

, Article 2008 9th International Conference on Signal Processing, ICSP 2008, Beijing, 26 October 2008 through 29 October 2008 ; 2008 , Pages 579-582 ; 9781424421794 (ISBN) Veisi, H ; Sameti, H ; Sharif University of Technology

2008

Abstract

In this paper, using of fault tolerant techniques are studied and experimented in speech recognition systems to make these systems robust to noise. Recognizer redundancy is implemented to utilize the strengths of several recognition methods that each one has acceptable performance in a specific condition. Duplication-with-comparison and NMR methods are experimented with majority and plurality voting on a telephony Persian speech-enabled IVR system. Results of evaluations present two promising outcomes, first, it improves the performance considerably; second, it enables us to detect the outputs with low confidence. © 2008 IEEE

Persian Speech Synthesis Using Hidden Markov Models

, M.Sc. Thesis Sharif University of Technology Bahaadini, Sara (Author) ; Sameti, Hossein (Supervisor)

Abstract

Scattered and little research in the field of Persian speech synthesis systems has been performed during the last ten years. Comprehensive framework that properly implements and adapts statistical speech synthesis methods for Persian has not been conducted yet. In this thesis, recent statistical parametric speech synthesis methods including CLUSTERGEN, traditional HMM-based speech synthesis and its STRAIGHT version, are implemented and adapted for Persian language. CCR test is carried out to compare these methods with each other and with unit selection method. Listeners Score samples based on CMOS. The methods were ranked by averaging the CCR scores. The results show that STRAIGHT-based...

محتواي پايان نامه

HMM-based persian speech synthesis using limited adaptation data

, Article International Conference on Signal Processing Proceedings, ICSP ; Volume 1 , 2012 , Pages 585-589 ; 9781467321945 (ISBN) Bahmaninezhad, F ; Sameti, H ; Khorram, S ; Sharif University of Technology

2012

Abstract

Speech synthesis systems provided for the Persian language so far need various large-scale speech corpora to synthesize several target speakers' voice. Accordingly, synthesizing speech with a small amount of data seems to be essential in Persian. Taking advantage of a speaker adaptation in the speech synthesis systems makes it possible to generate speech with remarkable quality when the data of the speaker are limited. Here we conducted this method for the first time in Persian. This paper describes speaker adaptation based on Hidden Markov Models (HMMs) in Persian speech synthesis system for FARsi Speech DATabase (FARSDAT). In this regard, we prepared the whole FARSDAT, then for...