Sharif Digital Repository / Sharif University of Technology / Search result

A textbook evaluation of speech acts: The case of English result series

, Article International Journal of Applied Linguistics and English Literature ; Volume 1, Issue 6 , 2012 , Pages 199-209 ; 22003592 (ISSN) Alemi, M ; Irandoost, R ; Sharif University of Technology

Australian International Academic Centre PTY LTD 2012

Abstract

The present work aimed to investigate the frequency of occurrences and proportions of speech acts of complaints and compliments in the four volumes of the course book English Result (Elementary, Pre-intermediate, Intermediate, and Upper-intermediate), by Mark Hancock and Annie McDonald (2009), published by Oxford University Press. Investigation of the two speech act strategies was based on complaints strategies (Olshtain and Weinbach, 1987) and compliment strategies (Wolfson and Manes, 1981). It was found that the books were rich in terms of the number of the two speech acts, but in presenting them, there were one or two dominant strategies in both cases. Chi-square analysis demonstrated...

ShEMO: a large-scale validated database for persian speech emotion detection

, Article Language Resources and Evaluation ; 2018 ; 1574020X (ISSN) Nezami, O. M ; Jamshid Lou, P ; Karami, M ; Sharif University of Technology

Springer Netherlands 2018

Abstract

This paper introduces a large-scale, validated database for Persian called Sharif Emotional Speech Database (ShEMO). The database includes 3000 semi-natural utterances, equivalent to 3 h and 25 min of speech data extracted from online radio plays. The ShEMO covers speech samples of 87 native-Persian speakers for five basic emotions including anger, fear, happiness, sadness and surprise, as well as neutral state. Twelve annotators label the underlying emotional state of utterances and majority voting is used to decide on the final labels. According to the kappa measure, the inter-annotator agreement is 64% which is interpreted as “substantial agreement”. We also present benchmark results...

ShEMO: a large-scale validated database for Persian speech emotion detection

, Article Language Resources and Evaluation ; Volume 53, Issue 1 , 2019 ; 1574020X (ISSN) Mohamad Nezami, O ; Jamshid Lou, P ; Karami, M ; Sharif University of Technology

Springer Netherlands 2019

Abstract

This paper introduces a large-scale, validated database for Persian called Sharif Emotional Speech Database (ShEMO). The database includes 3000 semi-natural utterances, equivalent to 3 h and 25 min of speech data extracted from online radio plays. The ShEMO covers speech samples of 87 native-Persian speakers for five basic emotions including anger, fear, happiness, sadness and surprise, as well as neutral state. Twelve annotators label the underlying emotional state of utterances and majority voting is used to decide on the final labels. According to the kappa measure, the inter-annotator agreement is 64% which is interpreted as “substantial agreement”. We also present benchmark results...

Steganography in silence intervals of speech

, Article 2008 4th International Conference on Intelligent Information Hiding and Multiedia Signal Processing, IIH-MSP 2008, Harbin, 15 August 2008 through 17 August 2008 ; 2008 , Pages 605-607 ; 9780769532783 (ISBN) Shirali Shahreza, S ; Shirali Shahreza, M ; Sharif University of Technology

2008

Abstract

This paper presents a new approach for hiding information in speech signals. In this method, the silence intervals of speech are found and the length (number of samples) of these intervals is changed to hide information. This method can be used simultaneously with other methods. © 2008 IEEE

Noise and speaker robustness in a persian continuous speech recognition system

, Article 2007 9th International Symposium on Signal Processing and its Applications, ISSPA 2007, Sharjah, 12 February 2007 through 15 February 2007 ; 2007 ; 1424407796 (ISBN); 9781424407798 (ISBN) Veisi, H ; Sameti, H ; Sharif University of Technology

2007

Abstract

In this paper VTLN speaker normalization, MLLR and MAP adaptation methods are investigated in a Persian HMM-based speaker independent large vocabulary continuous speech recognition system. Speaker and environmental noise robustness are achieved in real world applications for this system. A search-based method is used in VTLN to find speaker relative warping factors. The warping factors are applied to signal's spectrum to normalize the variation effect of VTL between speakers. In the MLLR framework, Gaussian mean and covariance transformations in global and full adaptation are experienced. In this method, regression tree based adaptation in batch-supervised fashion is used. Also the standard...

CIROLS: Codec independent recovery of lost speech packets

, Article 2007 9th International Symposium on Signal Processing and its Applications, ISSPA 2007, Sharjah, 12 February 2007 through 15 February 2007 ; 2007 ; 1424407796 (ISBN); 9781424407798 (ISBN) Ajorloo, H ; Manzuri Shalmani, M. T ; Aghatabar, M. M ; Sharif University of Technology

2007

Abstract

In this paper, we have focused on finding an error resilient method for discontinuity-less transmission of speech signals in the internet. Our proposed method creates artificial correlation between speech samples that pre-distorts the speech signal. The receiver uses this correlation to reconstruct the lost speech packets. A discrete Fourier transform (DFT)-based speech enhancement technique is designed for the reduction of the processing error in the recovered speech caused by the speech codecs. The SegSNR results show the superiority of our proposed method over a recently proposed speech enhancement technique. ©2007 IEEE

Likelihood-maximizing-based multiband spectral subtraction for robust speech recognition

, Article Eurasip Journal on Advances in Signal Processing ; Volume 2009 , 2009 ; 16876172 (ISSN) Babaali, B ; Sameti, H ; Safayani, M ; Sharif University of Technology

2009

Abstract

Automatic speech recognition performance degrades significantly when speech is affected by environmental noise. Nowadays, the major challenge is to achieve good robustness in adverse noisy conditions so that automatic speech recognizers can be used in real situations. Spectral subtraction (SS) is a well-known and effective approach; it was originally designed for improving the quality of speech signal judged by human listeners. SS techniques usually improve the quality and intelligibility of speech signal while speech recognition systems need compensation techniques to reduce mismatch between noisy speech features and clean trained acoustic model. Nevertheless, correlation can be expected...

An improved spectral subtraction speech enhancement system by using an adaptive spectral estimator

, Article Canadian Conference on Electrical and Computer Engineering 2005, Saskatoon, SK, 1 May 2005 through 4 May 2005 ; Volume 2005 , 2005 , Pages 261-264 ; 08407789 (ISSN) Ayat, S ; Manzuri, M. T ; Dianat, R ; Kabudian, J ; Sharif University of Technology

2005

Abstract

Spectral subtraction is one of the most famous and common-used methods for speech enhancement. The main weakness of this method is the production of an annoying noise called musical noise. In this paper, we have reduced the musical noise and improved the quality of enhanced speech by increasing the accuracy of the system spectral estimator. This method is useful for speech enhancement systems in which the speech signal is degraded by stationary or near-stationary noises. Different experimental results in different SNRs confirmed the better performance of our system from both objective and subjective views, which means better quality and more SNR improvement. © 2005 IEEE

Predication of prosodic data in Persian text-to-speech systems using recurrent neural network

, Article Electronics Letters ; Volume 39, Issue 25 , 2003 , Pages 1868-1869 ; 00135194 (ISSN) Farrokhi, A ; Ghaemmaghami, S ; Sharif University of Technology

2003

Abstract

A simplified four-layer recurrent neural network (RNN) based architecture is introduced to generate prosodic information for improving naturalness in Persian text-to-speech (TTS) systems. The proposed RNN uses the first two layers at word level and the last two layers at syllable level to provide the TTS system with major prosodic parameters, including: pitch contour, energy contour, length of syllables, length and onset time of vowels, and duration of pauses. The experimental results show improvement of accuracy in prediction of prosodic parameters, as compared to similar prosody generation systems of higher complexity

Robust parsing for word lattices in continuous speech recognition systems

, Article 2007 9th International Symposium on Signal Processing and its Applications, ISSPA 2007, Sharjah, 12 February 2007 through 15 February 2007 ; 2007 ; 1424407796 (ISBN); 9781424407798 (ISBN) Momtazi, S ; Sameti, H ; Fazel Zarandi, M ; Bahrani, M ; Sharif University of Technology

2007

Abstract

One of the roles of a Natural Language Processing (NLP) model in Continuous Speech Recognition (CSR) systems is to find the best sentence hypothesis by ranking all n-best sentences according to the grammar. This paper describes a robust parsing algorithm for Spoken Language Recognition (SLR) which utilizes a technique that improves the efficiency of parsing. This technique integrates grammatical and statistical approaches, and by using a best-first parsing strategy improves the accuracy of recognition. Preliminary experimental results using a Persian continuous speech recognition system show effective improvements in accuracy with little change in recognition time. The word error rate was...

High rate data hiding in speech signal

, Article SIGMAP 2007 - 2nd International Conference on Signal Processing and Multimedia Applications, Barcelona, 28 July 2007 through 31 July 2007 ; 2007 , Pages 287-292 ; 9789898111135 (ISBN) Jahangiri, E ; Ghaemmaghami, S ; Sharif University of Technology

2007

Abstract

One of the main issues with data hiding algorithms is capacity of data embedding. Most of data hiding methods suffer from low capacity that could make them inappropriate in certain hiding applications. This paper presents a high capacity data hiding method that uses encryption and the multi-band speech synthesis paradigm. In this method, an encrypted covert message is embedded in the unvoiced bands of the speech signal that leads to a high data hiding capacity of tens of kbps in a typical digital voice file transmission scheme. The proposed method yields a new standpoint in design of data hiding systems in the sense of three major, basically conflicting requirements in steganography, i.e....

A model distance maximizing framework for speech recognizer-based speech enhancement

, Article AEU - International Journal of Electronics and Communications ; Volume 65, Issue 2 , February , 2011 , Pages 99-106 ; 14348411 (ISSN) Babaali, B ; Sameti, H ; Falk, T. H ; Sharif University of Technology

2011

Abstract

This paper has presented a novel discriminative parameter calibration approach based on the model distance maximizing (MDM) framework to improve the performance of our previously-proposed method based on spectral subtraction (SS) in a likelihood-maximizing framework. In the previous work, spectral over-subtraction factors were adjusted based on the conventional maximum-likelihood (ML) approach that utilized only the true model and did not consider other confused models, thus likely reached suboptimal solutions. While in the proposed MDM framework, improved speech recognition performance is obtained by maximizing the dissimilarities among models. Experimental results based on FARSDAT, TIMIT...

Spectral subtraction in likelihood-maximizing framework for robust speech recognition

, Article INTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association, Brisbane, QLD, 22 September 2008 through 26 September 2008 ; December , 2008 , Pages 980-983 ; 19909772 (ISSN) Baba Ali, B ; Sameti, H ; Safayani, M ; Sharif University of Technology

2008

Abstract

Spectral Subtraction (SS), as a speech enhancement technique, originally designed for improving quality of speech signal judged by human listeners. it usually improve the quality and intelligibility of speech signals, while the speech recognition systems need compensation techniques capable of reducing the mismatch between the noisy speech features and the clean models. This paper proposes a novel approach for solving this problem by considering the SS and the speech recognizer as two interconnected components, sharing the common goal of improved speech recognition accuracy. The experimental evaluations on a real recorded database and the TIMIT database show that the proposed method can...

Combining augmented reality and speech technologies to help deaf and hard of hearing people

, Article Proceedings - 2012 14th Symposium on Virtual and Augmented Reality, SVR 2012 ; 2012 , Pages 174-181 ; 9780769547251 (ISBN) Mirzaei, M. R ; Ghorshi, S ; Mortazavi, M ; Sharif University of Technology

2012

Abstract

Augmented Reality (AR), Automatic Speech Recognition (ASR) and Text-to-Speech Synthesis (TTS) can be used to help people with disabilities. In this paper, we combine these technologies to make a new system for helping deaf people. This system can take the narrator's speech and convert it into a readable text and show it directly on AR display. To improve the accuracy of the system, we use Audio-Visual Speech Recognition (AVSR) as a backup for the ASR engine in noisy environments. In addition, we use the TTS system to make our system more usable for deaf people. The results of testing the system show that its accuracy is over 85 percent on average in different places. Also, the result of a...

Spectral subtraction in model distance maximizing framework for robust speech recognition

, Article 2008 9th International Conference on Signal Processing, ICSP 2008, Beijing, 26 October 2008 through 29 October 2008 ; 2008 , Pages 627-630 ; 9781424421794 (ISBN) BabaAli, B ; Sameti, H ; Safayani, M ; Sharif University of Technology

2008

Abstract

This paper has presented a novel discriminative parameters calibration approach based on the Model Distance Maximizing (MDM) to improve the performance of our previous proposed robustness method named spectral subtraction (SS) in likelihoodmaximizing framework. In the previous work, for adjusting the spectral over-subtraction factor of SS, conventional ML approach is used that only utilizes the true model without considering other confused models. This makes it very probably to reach a suboptimal solution. While in MDM, by maximizing the dissimilarities among models, the performance of our speech recognizer-based spectral subtraction method could be further improved. Experimental results...

Parental control based on speaker class verification

, Article IEEE Transactions on Consumer Electronics ; Volume 54, Issue 3 , 2008 , Pages 1244-1251 ; 00983063 (ISSN) Shirali-Shahreza, S ; Sameti, H ; Shirali Shahreza, M ; Sharif University of Technology

2008

Abstract

Restricting children access to materials unsuitable for them such as violence scenes is very important for parents. So there is a feature named Parental Control in devices such as televisions and computers to define the contents children can access. The parental control setting must be protected from children and is usually done by a password. In this paper, we propose a new method for distinguishing between adult users and child users based on human speech. In our proposed method, the user must say a word and the adult users are identified by processing the speech. Our current implementation has 92.5% accuracy for distinguishing adult users from children. © 2008 IEEE

LPRE: Lost speech packet recovery withenhancement

, Article 2007 IEEE International Conference on Communications, ICC'07, Glasgow, Scotland, 24 June 2007 through 28 June 2007 ; August , 2007 , Pages 1778-1783 ; 05361486 (ISSN); 1424403537 (ISBN); 9781424403530 (ISBN) Ajorloo, H ; Manzuri Shalmani, M. T ; Sharif University of Technology

2007

Abstract

In the internet telephony, loss of IP packets causes instantaneous discontinuities in the received speech. In this paper, we have focused on finding an error resilient method for this problem. Our proposed method creates artificial correlation between speech samples that pre-distorts the speech signal. The receiver uses this correlation to reconstruct the lost speech packets. An appropriate speech enhancement technique is designed for the reduction of the processing error in the recovered speech caused by the speech codecs. The SegSNR results show the superiority of our proposed speech enhancement method over a recently proposed one. © 2007 IEEE

A novel fuzzy approach to speech recognition

, Article Proceedings - HIS'04: 4th International Conference on Hybrid Intelligent Systems, Kitakyushu, 5 December 2004 through 8 December 2004 ; 2005 , Pages 340-345 ; 0769522912 (ISBN) Halavati, R ; Shouraki, S.B ; Eshraghi, M ; Alemzadeh, M ; Ziaie, P ; Ishikawa M ; Hashimoto S ; Paprzycki M ; Barakova E ; Yoshida K ; Koppen M ; Corne D.M ; Abraham A ; Sharif University of Technology

2005

Abstract

This paper presents a novel approach to speech recognition using fuzzy modeling. The task begins with conversion of speech spectrogram into a linguistic description based on arbitrary colors and lengths. While phonemes are also described using these fuzzy measures, and recognition is done by normal fuzzy reasoning, a genetic algorithm optimizes phoneme definitions so that to classify samples into correct phonemes. The method is tested over a standard speech data base and the results are presented. © 2005 IEEE

Average voice modeling based on unbiased decision trees

, Article Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Mons ; Volume 7911 LNAI , June , 2013 , Pages 89-96 ; 03029743 (ISSN) ; 9783642388460 (ISBN) Bahmaninezhad, F ; Khorram, S ; Sameti, H ; Sharif University of Technology

2013

Abstract

Speaker adaptive speech synthesis based on Hidden Semi-Markov Model (HSMM) has been demonstrated to be dramatically effective in the presence of confined amount of speech data. However, we could intensify this effectiveness by training the average voice model appropriately. Hence, this study presents a new method for training the average voice model. This method guarantees that data from every speaker contributes to all the leaves of decision tree. We considered this fact that small training data and highly diverse contexts of training speakers are considered as disadvantages which degrade the quality of average voice model impressively, and further influence the adapted model and synthetic...

Implementation and evaluation of statistical parametric speech synthesis methods for the Persian language

, Article IEEE International Workshop on Machine Learning for Signal Processing, 18 September 2011 through 21 September 2011 ; September , 2011 , Page(s): 1 - 6 ; 9781457716232 (ISBN) Bahaadini, S ; Sameti, H ; Khorram, S ; Sharif University of Technology

2011

Abstract

Scattered and little research in the field of Persian speech synthesis systems has been performed during the last ten years. Comprehensive framework that properly implements and adapts statistical speech synthesis methods for Persian has not been conducted yet. In this paper, recent statistical parametric speech synthesis methods including CLUSTERGEN, traditional HMM-based speech synthesis and its STRAIGHT version, are implemented and adapted for Persian language. CCR test is carried out to compare these methods with each other and with unit selection method. Listeners Score samples based on CMOS. The methods were ranked by averaging the CCR scores. The results show that STRAIGHT-based...