Loading...
Search for: speech-recognition
0.017 seconds
Total 131 records

    Significant pathological voice discrimination by computing posterior distribution of balanced accuracy

    , Article Biomedical Signal Processing and Control ; Volume 73 , 2022 ; 17468094 (ISSN) Pakravan, M ; Jahed, M ; Sharif University of Technology
    Elsevier Ltd  2022
    Abstract
    The ability to speak lucidly plays a key role in social relations. Consequently, the role of the larynx is quite important, and timely diagnosis of laryngeal diseases has proved to be crucial. In this study, a simple computational model for inverse of speech production model is employed to extract the glottal waveform using speech signal. This waveform has useful information about vocal folds performance in terms of providing evidence for distinguishing pathological disorders. Furthermore, obtaining the significance of classification results is important, because it leads to reliable inferences. This study utilizes the sustained vowel sound /a/ and a well-referenced database, namely MEEI. In... 

    Light-sernet: a lightweight fully convolutional neural network for speech emotion recognition

    , Article 47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022, 23 May 2022 through 27 May 2022 ; Volume 2022-May , 2022 , Pages 6912-6916 ; 15206149 (ISSN); 9781665405409 (ISBN) Aftab, A ; Morsali, A ; Ghaemmaghami, S ; Champagne, B ; Chinese and Oriental Languages Information Processing Society (COLPIS); Singapore Exhibition and Convention Bureau; The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen); The Institute of Electrical and Electronics Engineers Signal Processing Society ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2022
    Abstract
    Detecting emotions directly from a speech signal plays an important role in effective human-computer interactions. Existing speech emotion recognition models require massive computational and storage resources, making them hard to implement concurrently with other machine-interactive tasks in embedded systems. In this paper, we propose an efficient and lightweight fully convolutional neural network for speech emotion recognition in systems with limited hardware resources. In the proposed FCNN model, various feature maps are extracted via three parallel paths with different filter sizes. This helps deep convolution blocks to extract high-level features, while ensuring sufficient separability.... 

    Deep learning in analytical chemistry

    , Article TrAC - Trends in Analytical Chemistry ; Volume 145 , 2021 ; 01659936 (ISSN) Debus, B ; Parastar, H ; Harrington, P ; Kirsanov, D ; Sharif University of Technology
    Elsevier B.V  2021
    Abstract
    In recent years, extensive research in the field of Deep Learning (DL) has led to the development of a wide array of machine learning algorithms dedicated to solving complex tasks such as image classification or speech recognition. Due to their unprecedented ability to explore large volumes of data and extract meaningful hidden structures, DL models have naturally drawn attention from various fields in science. Analytical chemistry, in particular, has successfully benefited from the application of DL tools for extracting qualitative and quantitative information from high-dimensional and complex chemical measurements. This report provides introductory reading for understanding DL machinery... 

    Semi-supervised parallel shared encoders for speech emotion recognition

    , Article Digital Signal Processing: A Review Journal ; Volume 118 , 2021 ; 10512004 (ISSN) Pourebrahim, Y ; Razzazi, F ; Sameti, H ; Sharif University of Technology
    Elsevier Inc  2021
    Abstract
    Supervised speech emotion recognition requires a large number of labeled samples that limit its use in practice. Due to easy access to unlabeled samples, a new semi-supervised method based on auto-encoders is proposed in this paper for speech emotion recognition. The proposed method performed the classification operation by extracting the information contained in unlabeled samples and combining it with the information in labeled samples. In addition, it employed maximum mean discrepancy cost function to reduce the distribution difference when the labeled and unlabeled samples were gathered from different datasets. Experimental results obtained on different emotional speech datasets... 

    Continuous emotion recognition during music listening using EEG signals: A fuzzy parallel cascades model

    , Article Applied Soft Computing ; Volume 101 , 2021 ; 15684946 (ISSN) Hasanzadeh, F ; Annabestani, M ; Moghimi, S ; Sharif University of Technology
    Elsevier Ltd  2021
    Abstract
    A controversial issue in artificial intelligence is human emotion recognition. This paper presents a fuzzy parallel cascades (FPC) model for predicting the continuous subjective emotional appraisal of music by time-varying spectral content of electroencephalogram (EEG) signals. The EEG, along with an emotional appraisal of 15 subjects, was recorded during listening to seven musical excerpts. The emotional appraisement was recorded along the valence and arousal emotional axes as a continuous signal. The FPC model was composed of parallel cascades with each cascade containing a fuzzy logic-based system. The FPC model performance was evaluated using linear regression (LR), support vector... 

    Replay spoofing countermeasure using autoencoder and siamese networks on ASVspoof 2019 challenge

    , Article Computer Speech and Language ; Volume 64 , 2020 Adiban, M ; Sameti, H ; Shehnepoor, S ; Sharif University of Technology
    Academic Press  2020
    Abstract
    Automatic Speaker Verification (ASV) is authentication of individuals by analyzing their speech signals. Different synthetic approaches allow spoofing to deceive ASV systems (ASVs), whether using techniques to imitate a voice or reconstruct the features. Attackers beat up the ASVs using four general techniques; impersonation, speech synthesis, voice conversion, and replay. The last technique is considered as a common and high potential tool for spoofing purposes since replay attacks are more accessible and require no technical knowledge of adversaries. In this study, we introduce a novel replay spoofing countermeasure for ASVs. Accordingly, we use the Constant Q Cepstral Coefficient (CQCC)... 

    Learning of tree-structured Gaussian graphical models on distributed data under communication constraints

    , Article IEEE Transactions on Signal Processing ; Volume 67, Issue 1 , 2019 , Pages 17-28 ; 1053587X (ISSN) Tavassolipour, M ; Motahari, S. A ; Manzuri Shalmani, M. T ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2019
    Abstract
    In this paper, learning of tree-structured Gaussian graphical models from distributed data is addressed. In our model, samples are stored in a set of distributed machines where each machine has access to only a subset of features. A central machine is then responsible for learning the structure based on received messages from the other nodes. We present a set of communication-efficient strategies, which are theoretically proved to convey sufficient information for reliable learning of the structure. In particular, our analyses show that even if each machine sends only the signs of its local data samples to the central node, the tree structure can still be recovered with high accuracy. Our... 

    Speaker recognition with random digit strings using uncertainty normalized HMM-Based i-Vectors

    , Article IEEE/ACM Transactions on Audio Speech and Language Processing ; Volume 27, Issue 11 , 2019 , Pages 1815-1825 ; 23299290 (ISSN) Maghsoodi, N ; Sameti, H ; Zeinali, H ; Stafylakis, T ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2019
    Abstract
    In this paper, we combine Hidden Markov Models HMMs with i-vector extractors to address the problem of text-dependent speaker recognition with random digit strings. We employ digit-specific HMMs to segment the utterances into digits, to perform frame alignment to HMM states and to extract Baum-Welch statistics. By making use of the natural partition of input features into digits, we train digit-specific i-vector extractors on top of each HMM and we extract well-localized i-vectors, each modelling merely the phonetic content corresponding to a single digit. We then examine ways to perform channel and uncertainty compensation, and we propose a novel method for using the uncertainty in the... 

    Using ASR methods for OCR

    , Article 15th IAPR International Conference on Document Analysis and Recognition, ICDAR 2019, 20 September 2019 through 25 September 2019 ; 2019 , Pages 663-668 ; 15205363 (ISSN); 9781728128610 (ISBN) Arora, A ; Garcia, P ; Watanabe, S ; Manohar, V ; Shao, Y ; Khudanpur, S ; Chang, C. C ; Rekabdar, B ; Babaali, B ; Povey, D ; Etter, D ; Raj, D ; Hadian, H ; Trmal, J ; Sharif University of Technology
    IEEE Computer Society  2019
    Abstract
    Hybrid deep neural network hidden Markov models (DNN-HMM) have achieved impressive results on large vocabulary continuous speech recognition (LVCSR) tasks. However, the recent approaches using DNN-HMM models are not explored much for text recognition. Inspired by the current work in automatic speech recognition (ASR) and machine translation, we present an open vocabulary sub-word text recognition system. The sub-word lexicon and sub-word language model (LM) helps in overcoming the challenge of recognizing out of vocabulary (OOV) words, and a time delay neural network (TDNN) and convolution neural network (CNN) based DNN-HMM optical model (OM) efficiently models the sequence dependency in the... 

    An efficient real-time voice activity detection algorithm using teager energy to energy ratio

    , Article 27th Iranian Conference on Electrical Engineering, ICEE 2019, 30 April 2019 through 2 May 2019 ; 2019 , Pages 1420-1424 ; 9781728115085 (ISBN) Hadi, M ; Pakravan, M. R ; Razavi, M. M ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2019
    Abstract
    We define a new feature called Teager Energy to Energy and mathematically show that it provides distinguished values for pure tone and white noise signals. We then employ the Teager Energy to Energy feature to propose an efficient procedure for voice activity detection and use simulation results to evaluate its performance in different noisy environments. Furthermore, we experimentally demonstrate the performance of the proposed voice activity detection technique in a real-time voice processing embedded system. Experimental and simulation results show that the introduced procedure provides more reliable results with a reasonable amount of computational complexity in comparison with its... 

    Statistical association mapping of population-structured genetic data

    , Article IEEE/ACM Transactions on Computational Biology and Bioinformatics ; Volume 16, Issue 2 , 2019 , Pages 636-649 ; 15455963 (ISSN) Najafi, A ; Janghorbani, S ; Motahari, A ; Fatemizadeh, E ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2019
    Abstract
    Association mapping of genetic diseases has attracted extensive research interest during the recent years. However, most of the methodologies introduced so far suffer from spurious inference of the associated sites due to population inhomogeneities. In this paper, we introduce a statistical framework to compensate for this shortcoming by equipping the current methodologies with a state-of-the-art clustering algorithm being widely used in population genetics applications. The proposed framework jointly infers the disease-associated factors and the hidden population structures. In this regard, a Markov Chain-Monte Carlo (MCMC) procedure has been employed to assess the posterior probability... 

    Improving LF-MMI using unconstrained supervisions for ASR

    , Article 2018 IEEE Spoken Language Technology Workshop, SLT 2018, 18 December 2018 through 21 December 2018 ; 2019 , Pages 43-47 ; 9781538643341 (ISBN) Hadian, H ; Povey, D ; Sameti, H ; Trmal, J ; Khudanpur, S ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2019
    Abstract
    We present our work on improving the numerator graph for discriminative training using the lattice-free maximum mutual information (MMI) criterion. Specifically, we propose a scheme for creating unconstrained numerator graphs by removing time constraints from the baseline numerator graphs. This leads to much smaller graphs and therefore faster preparation of training supervisions. By testing the proposed un-constrained supervisions using factorized time-delay neural network (TDNN) models, we observe 0.5% to 2.6% relative improvement over the state-of-the-art word error rates on various large-vocabulary speech recognition databases. © 2018 IEEE  

    Learning of tree-structured gaussian graphical models on distributed data under communication constraints

    , Article IEEE Transactions on Signal Processing ; Volume 67, Issue 1 , 2019 , Pages 17-28 ; 1053587X (ISSN) Tavassolipour, M ; Motahari, A ; Manzuri Shalmani, M. T ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2019
    Abstract
    In this paper, learning of tree-structured Gaussian graphical models from distributed data is addressed. In our model, samples are stored in a set of distributed machines where each machine has access to only a subset of features. A central machine is then responsible for learning the structure based on received messages from the other nodes. We present a set of communication-efficient strategies, which are theoretically proved to convey sufficient information for reliable learning of the structure. In particular, our analyses show that even if each machine sends only the signs of its local data samples to the central node, the tree structure can still be recovered with high accuracy. Our... 

    Learning of tree-structured Gaussian graphical models on distributed data under communication constraints

    , Article IEEE Transactions on Signal Processing ; Volume 67, Issue 1 , 2019 , Pages 17-28 ; 1053587X (ISSN) Tavassolipour, M ; Motahari, S. A ; Manzuri Shalmani, M. T ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2019
    Abstract
    In this paper, learning of tree-structured Gaussian graphical models from distributed data is addressed. In our model, samples are stored in a set of distributed machines where each machine has access to only a subset of features. A central machine is then responsible for learning the structure based on received messages from the other nodes. We present a set of communication-efficient strategies, which are theoretically proved to convey sufficient information for reliable learning of the structure. In particular, our analyses show that even if each machine sends only the signs of its local data samples to the central node, the tree structure can still be recovered with high accuracy. Our... 

    Flat-Start single-stage discriminatively trained hmm-based models for asr

    , Article IEEE/ACM Transactions on Audio Speech and Language Processing ; Volume 26, Issue 11 , 2018 , Pages 1949-1961 ; 23299290 (ISSN) Hadian, H ; Sameti, H ; Povey, D ; Khudanpur, S ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2018
    Abstract
    In recent years, end-to-end approaches to automatic speech recognition have received considerable attention as they are much faster in terms of preparing resources. However, conventional multistage approaches, which rely on a pipeline of training hidden Markov models (HMM)-GMM models and tree-building steps still give the state-of-the-art results on most databases. In this study, we investigate flat-start one-stage training of neural networks using lattice-free maximum mutual information (LF-MMI) objective function with HMM for large vocabulary continuous speech recognition. We thoroughly look into different issues that arise in such a setup and propose a standalone system, which achieves... 

    End-to-end speech recognition using lattice-free MMI

    , Article 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018, 2 September 2018 through 6 September 2018 ; Volume 2018-September , 2018 , Pages 12-16 ; 2308457X (ISSN) Hadian, H ; Sameti, H ; Povey, D ; Khudanpur, S ; Sharif University of Technology
    International Speech Communication Association  2018
    Abstract
    We present our work on end-to-end training of acoustic models using the lattice-free maximum mutual information (LF-MMI) objective function in the context of hidden Markov models. By end-to-end training, we mean flat-start training of a single DNN in one stage without using any previously trained models, forced alignments, or building state-tying decision trees. We use full biphones to enable context-dependent modeling without trees, and show that our end-to-end LF-MMI approach can achieve comparable results to regular LF-MMI on well-known large vocabulary tasks. We also compare with other end-to-end methods such as CTC in character-based and lexicon-free settings and show 5 to 25 percent... 

    Evaluation of a novel fuzzy sequential pattern recognition tool (fuzzy elastic matching machine) and its applications in speech and handwriting recognition

    , Article Applied Soft Computing Journal ; Volume 62 , January , 2018 , Pages 315-327 ; 15684946 (ISSN) Shahmoradi, S ; Bagheri Shouraki, S ; Sharif University of Technology
    Elsevier Ltd  2018
    Abstract
    Sequential pattern recognition has long been an important topic of soft computing research with a wide area of applications including speech and handwriting recognition. In this paper, the performance of a novel fuzzy sequential pattern recognition tool named “Fuzzy Elastic Matching Machine” has been investigated. This tool overcomes the shortcomings of the HMM including its inflexible mathematical structure and inconsistent mathematical assumptions with imprecise input data. To do so, “Fuzzy Elastic Pattern” was introduced as the basic element of FEMM. It models the elasticity property of input data using fuzzy vectors. A sequential pattern such as a word in speech or a piece of writing is... 

    Acoustic modeling from frequency-domain representations of speech

    , Article Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2 September 2018 through 6 September 2018 ; Volume 2018-September , 2018 , Pages 1596-1600 ; 2308457X (ISSN) Ghahremani, P ; Hadian, H ; Lv, H ; Povey, D ; Khudanpur, S ; Sharif University of Technology
    International Speech Communication Association  2018
    Abstract
    In recent years, different studies have proposed new methods for DNN-based feature extraction and joint acoustic model training and feature learning from raw waveform for large vocabulary speech recognition. However, conventional pre-processed methods such as MFCC and PLP are still preferred in the state-of-the-art speech recognition systems as they are perceived to be more robust. Besides, the raw waveform methods - most of which are based on the time-domain signal - do not significantly outperform the conventional methods. In this paper, we propose a frequency-domain feature-learning layer which can allow acoustic model training directly from the waveform. The main distinctions from... 

    Frame-based face emotion recognition using linear discriminant analysis

    , Article 3rd Iranian Conference on Signal Processing and Intelligent Systems, ICSPIS 2017, 20 December 2017 through 21 December 2017 ; Volume 2017-December , December , 2018 , Pages 141-146 ; 9781538649725 (ISBN) Otroshi Shahreza, H ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2018
    Abstract
    In this paper, a frame-based method with reference frame was proposed to recognize six basic facial emotions (anger, disgust, fear, happy, sadness and surprise) and also neutral face. By using face landmarks, a fast algorithm was used to calculate an appropriate descriptor for each frame. Furthermore, Linear Discriminant Analysis (LDA) was used to reduce the dimension of defined descriptors and to classify them. The LDA problem was solved using the least squares solution and Ledoit-Wolf lemma. The proposed method was also compared with some studies on CK+ dataset which has the best accuracy among them. To generalize the proposed method over CK+ dataset, a landmark detector was needed.... 

    HMM-based phrase-independent i-vector extractor for text-dependent speaker verification

    , Article IEEE/ACM Transactions on Audio Speech and Language Processing ; Volume 25, Issue 7 , 2017 , Pages 1421-1435 ; 23299290 (ISSN) Zeinali, H ; Sameti, H ; Burget, L ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2017
    Abstract
    The low-dimensional i-vector representation of speech segments is used in the state-of-the-art text-independent speaker verification systems. However, i-vectors were deemed unsuitable for the text-dependent task, where simpler and older speaker recognition approaches were found more effective. In this work, we propose a straightforward hidden Markov model (HMM) based extension of the i-vector approach, which allows i-vectors to be successfully applied to text-dependent speaker verification. In our approach, the Universal Background Model (UBM) for training phrase-independent i-vector extractor is based on a set of monophone HMMs instead of the standard Gaussian Mixture Model (GMM). To...