Loading...
Search for: speech-recognition
0.011 seconds
Total 131 records

    Light-sernet: a lightweight fully convolutional neural network for speech emotion recognition

    , Article 47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022, 23 May 2022 through 27 May 2022 ; Volume 2022-May , 2022 , Pages 6912-6916 ; 15206149 (ISSN); 9781665405409 (ISBN) Aftab, A ; Morsali, A ; Ghaemmaghami, S ; Champagne, B ; Chinese and Oriental Languages Information Processing Society (COLPIS); Singapore Exhibition and Convention Bureau; The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen); The Institute of Electrical and Electronics Engineers Signal Processing Society ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2022
    Abstract
    Detecting emotions directly from a speech signal plays an important role in effective human-computer interactions. Existing speech emotion recognition models require massive computational and storage resources, making them hard to implement concurrently with other machine-interactive tasks in embedded systems. In this paper, we propose an efficient and lightweight fully convolutional neural network for speech emotion recognition in systems with limited hardware resources. In the proposed FCNN model, various feature maps are extracted via three parallel paths with different filter sizes. This helps deep convolution blocks to extract high-level features, while ensuring sufficient separability.... 

    Capacity bounds and detection schemes for data over voice

    , Article IEEE Transactions on Vehicular Technology ; Volume 65, Issue 11 , 2016 , Pages 8964-8977 ; 00189545 (ISSN) Kazemi, R ; Boloursaz, M ; Etemadi, S. M ; Behnia, F ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc 
    Abstract
    Cellular networks provide widespread and reliable voice communications among subscribers through mobile voice channels. These channels benefit from superior priority and higher availability compared with conventional cellular data communication services, such as General Packet Radio Service, Enhanced Data Rates for GSM Evolution, and High-Speed Downlink Packet Access. These properties are of major interest to applications that require transmitting small volumes of data urgently and reliably, such as an emergency call in vehicular applications. This encourages excessive research to make digital communication through voice channels feasible, leading to the emergence of Data over Voice (DoV)... 

    Low-complexity stochastic Generalized Belief Propagation

    , Article 2016 IEEE International Symposium on Information Theory, ISIT 2016, 10 July 2016 through 15 July 2016 ; Volume 2016-August , 2016 , Pages 785-789 ; 21578095 (ISSN) ; 9781509018062 (ISBN) Haddadpour, F ; Jafari Siavoshani, M ; Noshad, M ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc 
    Abstract
    The generalized belief propagation (GBP), introduced by Yedidia et al., is an extension of the belief propagation (BP) algorithm, which is widely used in different problems involved in calculating exact or approximate marginals of probability distributions. In many problems, it has been observed that the accuracy of GBP outperforms that of BP considerably. However, due to its generally higher complexity compared to BP, its application is limited in practice. In this paper, we introduce a stochastic version of GBP called stochastic generalized belief propagation (SGBP) that can be considered as an extension to the stochastic BP (SBP) algorithm introduced by Noorshams et al. They have shown... 

    Using ASR methods for OCR

    , Article 15th IAPR International Conference on Document Analysis and Recognition, ICDAR 2019, 20 September 2019 through 25 September 2019 ; 2019 , Pages 663-668 ; 15205363 (ISSN); 9781728128610 (ISBN) Arora, A ; Garcia, P ; Watanabe, S ; Manohar, V ; Shao, Y ; Khudanpur, S ; Chang, C. C ; Rekabdar, B ; Babaali, B ; Povey, D ; Etter, D ; Raj, D ; Hadian, H ; Trmal, J ; Sharif University of Technology
    IEEE Computer Society  2019
    Abstract
    Hybrid deep neural network hidden Markov models (DNN-HMM) have achieved impressive results on large vocabulary continuous speech recognition (LVCSR) tasks. However, the recent approaches using DNN-HMM models are not explored much for text recognition. Inspired by the current work in automatic speech recognition (ASR) and machine translation, we present an open vocabulary sub-word text recognition system. The sub-word lexicon and sub-word language model (LM) helps in overcoming the challenge of recognizing out of vocabulary (OOV) words, and a time delay neural network (TDNN) and convolution neural network (CNN) based DNN-HMM optical model (OM) efficiently models the sequence dependency in the... 

    A novel noise immune, Fuzzy approach to speaker independent, isolated word speech recognition

    , Article 2006 World Automation Congress, WAC'06, Budapest, 24 June 2006 through 26 June 2006 ; 2006 ; 1889335339 (ISBN); 9781889335339 (ISBN) Halavati, R ; Shouraki, S. B ; Razaghpour, M ; Tajik, H ; Cholakian, A ; Sharif University of Technology
    IEEE Computer Society  2006
    Abstract
    This paper presents a novel approach to isolated word speech recognition using fuzzy modeling which is specifically designed to ignore noise. The task is based on conversion of speech spectrogram into a linguistic fuzzy description and comparison of this representation with fuzzy linguistic descriptions of words. The method is tested on single speaker and multiple speaker tests and the results are compared with a widely used speech recognition approach, showing much higher noise resistance. Copyright - World Automation Congress (WAC) 2006  

    Localized discriminative Gaussian process latent variable model for text-dependent speaker verification

    , Article 24th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2016, 27 April 2016 through 29 April 2016 ; 2016 , Pages 183-188 ; 9782875870278 (ISBN) Maghsoodi, N ; Sameti, H ; Zeinali, H ; Sharif University of Technology
    i6doc.com publication  2016
    Abstract
    The duration of utterances is one of the effective factors on the performance of speaker verification systems. Text dependent speaker verification suffers from both short duration and unmatched content between enrollment and test segments. In this paper, we use Discriminative Gaussian Process Latent Variable Model (DGPLVM) to deal with the uncertainty caused by short duration. This is the first attempt to utilize Gaussian Process for speaker verification. Also, to manage the unmatched content between enrollment and test segments we proposed the localized-DGPLVM that trains DGPLVM for each phrase in dataset. Experiments show the relative improvement of 27.4% in EER on RSR2015  

    Building and incorporating language models for Persian continuous speech recognition systems

    , Article 5th International Conference on Language Resources and Evaluation, LREC 2006, 22 May 2006 through 28 May 2006 ; 2006 , Pages 2590-2593 Bahrani, M ; Sameti, H ; Hafezi, N ; Movasagh, H ; Sharif University of Technology
    European Language Resources Association (ELRA)  2006
    Abstract
    In this paper building statistical language models for Persian language using a corpus and incorporating them in Persian continuous speech recognition (CSR) system are described We used Persian Text Corpus for building the language models First we preprocessed the texts of corpus by correcting the different orthography of words Also, the number of POS tags was decreased by clustering POS tags manually Then we extracted word based monogram and POS-based bigram and trigram language models from the corpus We also present the procedure of incorporating language models in a Persian CSR system By using the language models 274% reduction m word error rate was achieved in the best case  

    Non-speaker information reduction from Cosine Similarity Scoring in i-vector based speaker verification

    , Article Computers and Electrical Engineering ; Volume 48 , November , 2015 , Pages 226–238 ; 00457906 (ISSN) Zeinali, H ; Mirian, A ; Sameti, H ; BabaAli, B ; Sharif University of Technology
    Elsevier Ltd  2015
    Abstract
    Cosine similarity and Probabilistic Linear Discriminant Analysis (PLDA) in i-vector space are two state-of-the-art scoring methods in speaker verification field. While PLDA usually gives better accuracy, Cosine Similarity Scoring (CSS) remains a widely used method due to simplicity and acceptable performance. In this domain, several channel compensation and score normalization methods have been proposed to improve the performance. We investigate non-speaker information in cosine similarity metric and propose a new approach to remove it from the decision making process. I-vectors hold a large amount of non-speaker information such as channel effects, language, and phonetic content. This type... 

    Evaluation of a novel fuzzy sequential pattern recognition tool (fuzzy elastic matching machine) and its applications in speech and handwriting recognition

    , Article Applied Soft Computing Journal ; Volume 62 , January , 2018 , Pages 315-327 ; 15684946 (ISSN) Shahmoradi, S ; Bagheri Shouraki, S ; Sharif University of Technology
    Elsevier Ltd  2018
    Abstract
    Sequential pattern recognition has long been an important topic of soft computing research with a wide area of applications including speech and handwriting recognition. In this paper, the performance of a novel fuzzy sequential pattern recognition tool named “Fuzzy Elastic Matching Machine” has been investigated. This tool overcomes the shortcomings of the HMM including its inflexible mathematical structure and inconsistent mathematical assumptions with imprecise input data. To do so, “Fuzzy Elastic Pattern” was introduced as the basic element of FEMM. It models the elasticity property of input data using fuzzy vectors. A sequential pattern such as a word in speech or a piece of writing is... 

    Continuous emotion recognition during music listening using EEG signals: A fuzzy parallel cascades model

    , Article Applied Soft Computing ; Volume 101 , 2021 ; 15684946 (ISSN) Hasanzadeh, F ; Annabestani, M ; Moghimi, S ; Sharif University of Technology
    Elsevier Ltd  2021
    Abstract
    A controversial issue in artificial intelligence is human emotion recognition. This paper presents a fuzzy parallel cascades (FPC) model for predicting the continuous subjective emotional appraisal of music by time-varying spectral content of electroencephalogram (EEG) signals. The EEG, along with an emotional appraisal of 15 subjects, was recorded during listening to seven musical excerpts. The emotional appraisement was recorded along the valence and arousal emotional axes as a continuous signal. The FPC model was composed of parallel cascades with each cascade containing a fuzzy logic-based system. The FPC model performance was evaluated using linear regression (LR), support vector... 

    Significant pathological voice discrimination by computing posterior distribution of balanced accuracy

    , Article Biomedical Signal Processing and Control ; Volume 73 , 2022 ; 17468094 (ISSN) Pakravan, M ; Jahed, M ; Sharif University of Technology
    Elsevier Ltd  2022
    Abstract
    The ability to speak lucidly plays a key role in social relations. Consequently, the role of the larynx is quite important, and timely diagnosis of laryngeal diseases has proved to be crucial. In this study, a simple computational model for inverse of speech production model is employed to extract the glottal waveform using speech signal. This waveform has useful information about vocal folds performance in terms of providing evidence for distinguishing pathological disorders. Furthermore, obtaining the significance of classification results is important, because it leads to reliable inferences. This study utilizes the sustained vowel sound /a/ and a well-referenced database, namely MEEI. In... 

    Semi-supervised parallel shared encoders for speech emotion recognition

    , Article Digital Signal Processing: A Review Journal ; Volume 118 , 2021 ; 10512004 (ISSN) Pourebrahim, Y ; Razzazi, F ; Sameti, H ; Sharif University of Technology
    Elsevier Inc  2021
    Abstract
    Supervised speech emotion recognition requires a large number of labeled samples that limit its use in practice. Due to easy access to unlabeled samples, a new semi-supervised method based on auto-encoders is proposed in this paper for speech emotion recognition. The proposed method performed the classification operation by extracting the information contained in unlabeled samples and combining it with the information in labeled samples. In addition, it employed maximum mean discrepancy cost function to reduce the distribution difference when the labeled and unlabeled samples were gathered from different datasets. Experimental results obtained on different emotional speech datasets... 

    Deep learning in analytical chemistry

    , Article TrAC - Trends in Analytical Chemistry ; Volume 145 , 2021 ; 01659936 (ISSN) Debus, B ; Parastar, H ; Harrington, P ; Kirsanov, D ; Sharif University of Technology
    Elsevier B.V  2021
    Abstract
    In recent years, extensive research in the field of Deep Learning (DL) has led to the development of a wide array of machine learning algorithms dedicated to solving complex tasks such as image classification or speech recognition. Due to their unprecedented ability to explore large volumes of data and extract meaningful hidden structures, DL models have naturally drawn attention from various fields in science. Analytical chemistry, in particular, has successfully benefited from the application of DL tools for extracting qualitative and quantitative information from high-dimensional and complex chemical measurements. This report provides introductory reading for understanding DL machinery... 

    Multi-antenna assisted spectrum sensing in spatially correlated noise environments

    , Article Signal Processing ; Volume 108 , December , 2015 , Pages 69-76 ; 01651684 (ISSN) Koochakzadeh, A ; Malek Mohammadi, M ; Babaie Zadeh, M ; Skoglund, M ; Sharif University of Technology
    Elsevier  2015
    Abstract
    A significant challenge in spectrum sensing is to lessen the signal to noise ratio needed to detect the presence of primary users while the noise level may also be unknown. To meet this challenge, multi-antenna based techniques possess a greater efficiency compared to other algorithms. In a typical compact multi-antenna system, due to small interelement spacing, mutual coupling between thermal noises of adjacent receivers is significant. In this paper, unlike most of the spectrum sensing algorithms which assume spatially uncorrelated noise, the noises on the adjacent antennas can have arbitrary correlations. Also, in contrast to some other algorithms, no prior assumption is made on the... 

    Replay spoofing countermeasure using autoencoder and siamese networks on ASVspoof 2019 challenge

    , Article Computer Speech and Language ; Volume 64 , 2020 Adiban, M ; Sameti, H ; Shehnepoor, S ; Sharif University of Technology
    Academic Press  2020
    Abstract
    Automatic Speaker Verification (ASV) is authentication of individuals by analyzing their speech signals. Different synthetic approaches allow spoofing to deceive ASV systems (ASVs), whether using techniques to imitate a voice or reconstruct the features. Attackers beat up the ASVs using four general techniques; impersonation, speech synthesis, voice conversion, and replay. The last technique is considered as a common and high potential tool for spoofing purposes since replay attacks are more accessible and require no technical knowledge of adversaries. In this study, we introduce a novel replay spoofing countermeasure for ASVs. Accordingly, we use the Constant Q Cepstral Coefficient (CQCC)... 

    SR-NBS: A fast sparse representation based N-best class selector for robust phoneme classification

    , Article Engineering Applications of Artificial Intelligence ; Vol. 28 , 2014 , pp. 155-164 Saeb, A ; Razzazi, F ; Babaie-Zadeh, M ; Sharif University of Technology
    Abstract
    Although exemplar based approaches have shown good accuracy in classification problems, some limitations are observed in the accuracy of exemplar based automatic speech recognition (ASR) applications. The main limitation of these algorithms is their high computational complexity which makes them difficult to extend to ASR applications. In this paper, an N-best class selector is introduced based on sparse representation (SR) and a tree search strategy. In this approach, the classification is fulfilled in three steps. At first, the set of similar training samples for the specific test sample is selected by k-dimensional (KD) tree search algorithm. Then, an SR based N-best class selector is... 

    Audio-visual speech recognition techniques in augmented reality environments

    , Article Visual Computer ; Vol. 30, issue. 3 , March , 2014 , pp. 245-257 ; ISSN: 01782789 Mirzaei, M. R ; Ghorshi, S ; Mortazavi, M ; Sharif University of Technology
    Abstract
    Many recent studies show that Augmented Reality (AR) and Automatic Speech Recognition (ASR) technologies can be used to help people with disabilities. Many of these studies have been performed only in their specialized field. Audio-Visual Speech Recognition (AVSR) is one of the advances in ASR technology that combines audio, video, and facial expressions to capture a narrator's voice. In this paper, we combine AR and AVSR technologies to make a new system to help deaf and hard-of-hearing people. Our proposed system can take a narrator's speech instantly and convert it into a readable text and show the text directly on an AR display. Therefore, in this system, deaf people can read the... 

    An evolutionary decoding method for HMM-based continuous speech recognition systems using particle swarm optimization

    , Article Pattern Analysis and Applications ; Vol. 17, issue. 2 , 2014 , pp. 327-339 Najkar, N ; Razzazi, F ; Sameti, H ; Sharif University of Technology
    Abstract
    The main recognition procedure in modern HMM-based continuous speech recognition systems is Viterbi algorithm. Viterbi algorithm finds out the best acoustic sequence according to input speech in the search space using dynamic programming. In this paper, dynamic programming is replaced by a search method which is based on particle swarm optimization. The major idea is focused on generating initial population of particles as the speech segmentation vectors. The particles try to achieve the best segmentation by an updating method during iterations. In this paper, a new method of particles representation and recognition process is introduced which is consistent with the nature of continuous... 

    A fast phoneme recognition system based on sparse representation of test utterances

    , Article 2014 4th Joint Workshop on Hands-Free Speech Communication and Microphone Arrays, HSCMA 2014 ; 2014 , p. 32-36 Saeb, A ; Razzazi, F ; Babaei-Zadeh, M ; Sharif University of Technology
    Abstract
    In this paper, a fast phoneme recognition system is introduced based on sparse representation. In this approach, the phoneme recognition is fulfilled by Viterbi decoding on support vector machines (SVM) output probability estimates. The candidate classes for classification are adaptively pruned by a k-dimensional (KD) tree search followed by a sparse representation (SR) based class selector with adaptive number of classes. We applied the proposed approach to introduce a phoneme recognition system and compared it with some well-known phoneme recognition systems according to accuracy and complexity issues. By this approach, we obtain competitive phoneme error rate with promising computational... 

    SFAVD: Sharif farsi audio visual database

    , Article IKT 2013 - 2013 5th Conference on Information and Knowledge Technology, Shiraz, Iran ; 2013 , Pages 417-421 ; 9781467364904 (ISBN) Naraghi, Z ; Jamzad, M ; Sharif University of Technology
    2013
    Abstract
    With increasing use of computers in everyday life, improved communication between machines and human is needed. To make a right communication and understand a humankind face which is made in a graphical environment, implementing the audio and visual projects like lip reading, audio and visual speech recognition and lip making are needed. Lack of a complete audio and visual database for this application in Farsi language made us provide a new complete Farsi database for this project that is called SFAVD. It is a unique audio and visual database which in addition to considering Farsi conceptual and speech structure, it considers influence of speech on lip changes. This database is created for...