Loading...
Search for: speech-recognition
0.008 seconds
Total 131 records

    Localized discriminative Gaussian process latent variable model for text-dependent speaker verification

    , Article 24th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2016, 27 April 2016 through 29 April 2016 ; 2016 , Pages 183-188 ; 9782875870278 (ISBN) Maghsoodi, N ; Sameti, H ; Zeinali, H ; Sharif University of Technology
    i6doc.com publication  2016
    Abstract
    The duration of utterances is one of the effective factors on the performance of speaker verification systems. Text dependent speaker verification suffers from both short duration and unmatched content between enrollment and test segments. In this paper, we use Discriminative Gaussian Process Latent Variable Model (DGPLVM) to deal with the uncertainty caused by short duration. This is the first attempt to utilize Gaussian Process for speaker verification. Also, to manage the unmatched content between enrollment and test segments we proposed the localized-DGPLVM that trains DGPLVM for each phrase in dataset. Experiments show the relative improvement of 27.4% in EER on RSR2015  

    Low-complexity stochastic Generalized Belief Propagation

    , Article 2016 IEEE International Symposium on Information Theory, ISIT 2016, 10 July 2016 through 15 July 2016 ; Volume 2016-August , 2016 , Pages 785-789 ; 21578095 (ISSN) ; 9781509018062 (ISBN) Haddadpour, F ; Jafari Siavoshani, M ; Noshad, M ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc 
    Abstract
    The generalized belief propagation (GBP), introduced by Yedidia et al., is an extension of the belief propagation (BP) algorithm, which is widely used in different problems involved in calculating exact or approximate marginals of probability distributions. In many problems, it has been observed that the accuracy of GBP outperforms that of BP considerably. However, due to its generally higher complexity compared to BP, its application is limited in practice. In this paper, we introduce a stochastic version of GBP called stochastic generalized belief propagation (SGBP) that can be considered as an extension to the stochastic BP (SBP) algorithm introduced by Noorshams et al. They have shown... 

    Statistical association mapping of population-structured genetic data

    , Article IEEE/ACM Transactions on Computational Biology and Bioinformatics ; 2017 ; 15455963 (ISSN) Najafi, A ; Janghorbani, S ; Motahari, S. A ; Fatemizadeh, E ; Sharif University of Technology
    Abstract
    Association mapping of genetic diseases has attracted extensive research interest during the recent years. However, most of the methodologies introduced so far suffer from spurious inference of the associated sites due to population inhomogeneities. In this paper, we introduce a statistical framework to compensate for this shortcoming by equipping the current methodologies with a state-of-the-art clustering algorithm being widely used in population genetics applications. The proposed framework jointly infers the disease-associated factors and the hidden population structures. In this regard, a Markov Chain-Monte Carlo (MCMC) procedure has been employed to assess the posterior probability... 

    Spoken CAPTCHA: a CAPTCHA system for blind users

    , Article 2009 Second ISECS International Colloquium on Computing, Communication, Control, and Management, CCCM 2009, Sanya, 8 August 2009 through 9 August 2009 ; Volume 1 , 2009 , Pages 221-224 ; 9781424442461 (ISBN) Shirali Shahreza, S ; Abolhassani, H ; Sameti, H ; Shirali Shahreza, M. H ; Yangzhou University; Guangdong University of Business Studies; Wuhan Institute of Technology; IEEE SMC TC on Education Technology and Training; IEEE Technology Management Council ; Sharif University of Technology
    2009
    Abstract
    Today, the Internet is used to offer different services to users. Most of these services are designed for human users, but unfortunately some computer programs are designed which abuse these services. CAPTCHA (Completely Automated Public Turing test to tell Computers and Human Apart) systems are designed to automatically distinguish between human users and computer programs and block such computer programs. Most of current CAPTCHA methods are using visual patterns and hence blind users cannot use them. In this paper, we propose a new CAPTCHA method which is designed for blind people. In this method, a small sound clip is played for the user and he/she is asked to say a word. Then the user... 

    Speaker recognition with random digit strings using uncertainty normalized HMM-Based i-Vectors

    , Article IEEE/ACM Transactions on Audio Speech and Language Processing ; Volume 27, Issue 11 , 2019 , Pages 1815-1825 ; 23299290 (ISSN) Maghsoodi, N ; Sameti, H ; Zeinali, H ; Stafylakis, T ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2019
    Abstract
    In this paper, we combine Hidden Markov Models HMMs with i-vector extractors to address the problem of text-dependent speaker recognition with random digit strings. We employ digit-specific HMMs to segment the utterances into digits, to perform frame alignment to HMM states and to extract Baum-Welch statistics. By making use of the natural partition of input features into digits, we train digit-specific i-vector extractors on top of each HMM and we extract well-localized i-vectors, each modelling merely the phonetic content corresponding to a single digit. We then examine ways to perform channel and uncertainty compensation, and we propose a novel method for using the uncertainty in the... 

    An efficient real-time voice activity detection algorithm using teager energy to energy ratio

    , Article 27th Iranian Conference on Electrical Engineering, ICEE 2019, 30 April 2019 through 2 May 2019 ; 2019 , Pages 1420-1424 ; 9781728115085 (ISBN) Hadi, M ; Pakravan, M. R ; Razavi, M. M ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2019
    Abstract
    We define a new feature called Teager Energy to Energy and mathematically show that it provides distinguished values for pure tone and white noise signals. We then employ the Teager Energy to Energy feature to propose an efficient procedure for voice activity detection and use simulation results to evaluate its performance in different noisy environments. Furthermore, we experimentally demonstrate the performance of the proposed voice activity detection technique in a real-time voice processing embedded system. Experimental and simulation results show that the introduced procedure provides more reliable results with a reasonable amount of computational complexity in comparison with its... 

    Replay spoofing countermeasure using autoencoder and siamese networks on ASVspoof 2019 challenge

    , Article Computer Speech and Language ; Volume 64 , 2020 Adiban, M ; Sameti, H ; Shehnepoor, S ; Sharif University of Technology
    Academic Press  2020
    Abstract
    Automatic Speaker Verification (ASV) is authentication of individuals by analyzing their speech signals. Different synthetic approaches allow spoofing to deceive ASV systems (ASVs), whether using techniques to imitate a voice or reconstruct the features. Attackers beat up the ASVs using four general techniques; impersonation, speech synthesis, voice conversion, and replay. The last technique is considered as a common and high potential tool for spoofing purposes since replay attacks are more accessible and require no technical knowledge of adversaries. In this study, we introduce a novel replay spoofing countermeasure for ASVs. Accordingly, we use the Constant Q Cepstral Coefficient (CQCC)... 

    Recognition of human speech phonemes using a novel fuzzy approach

    , Article Applied Soft Computing Journal ; Volume 7, Issue 3 , 2007 , Pages 828-839 ; 15684946 (ISSN) Halavati, R ; Bagheri Shouraki, S ; Harati Zadeh, S ; Sharif University of Technology
    2007
    Abstract
    Recognition of human speech has long been a hot topic among artificial intelligence and signal processing researches. Most of current policies for this subject are based on extraction of precise features of voice signal and trying to make most out of them by heavy computations. But this focus on signal details has resulted in too much sensitivity to noise and as a result, the necessity of complex noise detection and removal algorithms, which composes a trade-off between fast or noise robust recognition. This paper presents a novel approach to speech recognition using fuzzy modeling and decision making that ignores noise instead of its detection and removal. To do so, the speech spectrogram... 

    Fuzzy classification by multi-layer averaging: An application in speech recognition

    , Article 3rd International Conference on Informatics in Control, Automation and Robotics, ICINCO 2006, Setubal, 1 August 2006 through 5 August 2006 ; Volume SPSMC , 2006 , Pages 122-126 ; 9728865619 (ISBN); 9789728865610 (ISBN) Alemzadeh, M ; Shouraki, S. B ; Halavati, R ; Sharif University of Technology
    2006
    Abstract
    This paper intends to introduce a simple fast space-efficient linear method for a general pattern recognition problem. The presented algorithm can find the closest match for a given sample within a number of samples which has already been introduced to the system. The fact of using averaging and fuzzy numbers in this method encourages that it may be a noise resistant recognition process. As a test bed, a problem of recognition of spoken words has been set forth to this algorithm. Test data contain clean and noisy samples and results have been compared to that of a widely used speech recognition method, HMM  

    A novel approach to very fast and noise robust, isolated word speech recognition

    , Article 18th International Conference on Pattern Recognition, ICPR 2006, Hong Kong, 20 August 2006 through 24 August 2006 ; Volume 3 , 2006 , Pages 190-193 ; 10514651 (ISSN); 0769525210 (ISBN); 9780769525211 (ISBN) Halavati, R ; Bagheri Shouraki, S ; Tajik, H ; Cholakian, A ; Razaghpour, M ; Sharif University of Technology
    2006
    Abstract
    A novel very light weight approach to isolated word speech recognition is introduced. The approach uses a new simplistic feature set and a neural network recognition system. The algorithm's main processing requirements are FFT computation and a simple neural network comparison, making the method a suitable solution for low price embedded devices. The proposed method is tested on single speaker and multiple speaker test sets and the results are compared with a widely used speech recognition approach, presenting very fast recognition and quite good recognition rate. © 2006 IEEE  

    Semi-supervised parallel shared encoders for speech emotion recognition

    , Article Digital Signal Processing: A Review Journal ; Volume 118 , 2021 ; 10512004 (ISSN) Pourebrahim, Y ; Razzazi, F ; Sameti, H ; Sharif University of Technology
    Elsevier Inc  2021
    Abstract
    Supervised speech emotion recognition requires a large number of labeled samples that limit its use in practice. Due to easy access to unlabeled samples, a new semi-supervised method based on auto-encoders is proposed in this paper for speech emotion recognition. The proposed method performed the classification operation by extracting the information contained in unlabeled samples and combining it with the information in labeled samples. In addition, it employed maximum mean discrepancy cost function to reduce the distribution difference when the labeled and unlabeled samples were gathered from different datasets. Experimental results obtained on different emotional speech datasets... 

    Continuous emotion recognition during music listening using EEG signals: A fuzzy parallel cascades model

    , Article Applied Soft Computing ; Volume 101 , 2021 ; 15684946 (ISSN) Hasanzadeh, F ; Annabestani, M ; Moghimi, S ; Sharif University of Technology
    Elsevier Ltd  2021
    Abstract
    A controversial issue in artificial intelligence is human emotion recognition. This paper presents a fuzzy parallel cascades (FPC) model for predicting the continuous subjective emotional appraisal of music by time-varying spectral content of electroencephalogram (EEG) signals. The EEG, along with an emotional appraisal of 15 subjects, was recorded during listening to seven musical excerpts. The emotional appraisement was recorded along the valence and arousal emotional axes as a continuous signal. The FPC model was composed of parallel cascades with each cascade containing a fuzzy logic-based system. The FPC model performance was evaluated using linear regression (LR), support vector... 

    A new word clustering method for building n-gram language models in continuous speech recognition systems

    , Article Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 18 June 2008 through 20 June 2008, Wroclaw ; Volume 5027 LNAI , 2008 , Pages 286-293 ; 03029743 (ISSN) ; 354069045X (ISBN); 9783540690450 (ISBN) Bahrani, M ; Sameti, H ; Hafezi, N ; Momtazi, S ; Sharif University of Technology
    2008
    Abstract
    In this paper a new method for automatic word clustering is presented. We used this method for building n-gram language models for Persian continuous speech recognition (CSR) systems. In this method, each word is specified by a feature vector that represents the statistics of parts of speech (POS) of that word. The feature vectors are clustered by k-means algorithm. Using this method causes a reduction in time complexity which is a defect in other automatic clustering methods. Also, the problem of high perplexity in manual clustering methods is abated. The experimental results are based on "Persian Text Corpus" which contains about 9 million words. The extracted language models are evaluated... 

    Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science (M.Sc.) in Computer Engineering, Artificial Intelligence

    , M.Sc. Thesis Sharif University of Technology Hosseini, Mohammad Saleh (Author) ; Sameti, Hossein (Supervisor)
    Abstract
    Punctuation marks in every language, constitute an important part of a text. Not inserting these punctuations in text, makes the text ambiguous. The output text of automatic speech recognition (ASR) system, is typically a raw sequence of words, containing no punctuation marks. This makes the text difficult or even impossible to make sense of for humans, as well as for any further text processing tasks. The goal of this thesis is to perform automatic punctuation insertion in Persian texts lacking punctuation marks. To the best of our knowledge, this is the first work done in this context for the Persian language. For this purpose, firstly, we assembled a state-of-the-art corpus to train and... 

    Robust Speech Recognition Based on Data Compensation and MDT Methods

    , M.Sc. Thesis Sharif University of Technology BabaAli, Bagher (Author) ; Sameti, Hossein (Supervisor)
    Abstract
    Automatic speech recognition performance degrades significantly when speech is affected by environmental noise. Nowadays, the major challenge is to achieve good robustness in adverse noisy conditions so that automatic speech recognizers can be used in real situations. Spectral subtraction (SS) is a well-known and effective approach; it was originally designed for improving the quality of speech signal judged by human listeners. SS techniques usually improve the quality and intelligibility of speech signal while speech recognition systems need compensation techniques to reduce mismatch between noisy speech features and clean trained acoustic model. Nevertheless, correlation can be expected... 

    On the Use of Artificial Neural Networks in Automatic Speech Recognition

    , M.Sc. Thesis Sharif University of Technology Hassani, Adel (Author) ; Ghorshi, Mohammad Ali (Supervisor) ; Khayyat, Amir Ali Akbar (Supervisor)
    Abstract
    In this thesis, the Artificial Neural Networks (ANN) will be used in Automatic Speech Recognition (ASR) instead of Hidden Markov Models (HMM). Hidden Markov Model is one of the most dominant Bayesian network technologies and is the most successful model in current ASR systems. However, excessive training time is a major issue in speech recognition based on Hidden Markov Model (HMM). This thesis presents an Artificial Neural Network language model for human speech by mapping the spectral features of speech namely the formants, cepstrum (Mel-Frequency Cepstral Coefficients (MFCCs)) and Power Spectral Density (PSD) as features of samples of specific words into a discrete vector space. The... 

    Persian End-To-End Speech Recognition

    , M.Sc. Thesis Sharif University of Technology Hajipour Ghomi, Farzaneh (Author) ; Sameti, Hossein (Supervisor)
    Abstract
    This thesis provids a Persian End-To-End Speech Recognition system. In this system, the input is low-level features of speech signal. Deep recurrent neural networks (RNNs) and Long Short-Term Memory (LSTM) units as the RNN building blocks are used as the acoustic model. Continuous speech data is labeled by the CTC which is applied as the output layer of a recurrent neural network. By using the CTC objective function, acoustic modeling problem is simplified to just an RNN learning problem over pairs of speech and context-independent (CI) label sequences. A distinctive feature of this system is a generalized decoding approach based on weighted finite-state transducers (WFSTs), which enables... 

    Concept Extraction of Sequential Patterns for Imitative Learning

    , M.Sc. Thesis Sharif University of Technology Arjomand Aghaee, Ehsan (Author) ; Bagheri Shouraki, Saeed (Supervisor)
    Abstract
    The aim of this thesis is the concept extraction of sequential patterns for imitative learning for humanoid robots. In such a way that an existent that has the physical and cognitive similarities, begins to extract concepts and learns by observing the behavior of the other existent. In this project, it is assumed a humanoid robot that can understand the concepts such as hello, goodbye and different concepts and does the corresponding actions from the visual and auditory information. In this thesis, a new model has been presented to eliminate the improper and meaningless elasticity in patterns sequence, such as changes in accent or elasticity in movements. This model is called the fuzzy... 

    Pronunciation Scoring in Computer-Assisted Language Learning

    , M.Sc. Thesis Sharif University of Technology Mohammadi, Sajede (Author) ; Sameti, Hossein (Supervisor)
    Abstract
    Due to the increase in the number of people interested in learning new languages, in recent years, multiple systems have been developed to teach new languages to those who are interested. These systems are called Computer Assisted Language Learning (CALL). However, the most credible CALL systems, like Duolingo, do not support Persian. So the of this study is to design and implement one of the technical parts of CALL systems, the Computer Assisted Pronunciation Training(CAPT), which is the part responsible for evaluating the learners' input voice's pronunciation and generating appropriate score and feedback.In this study, good pronunciation means correct expression of words, correct... 

    Speech enhancement using hidden Markov models in Mel-frequency domain

    , Article Speech Communication ; Volume 55, Issue 2 , 2013 , Pages 205-220 ; 01676393 (ISSN) Veisi, H ; Sameti, H ; Sharif University of Technology
    2013
    Abstract
    Hidden Markov model (HMM)-based minimum mean square error speech enhancement method in Mel-frequency domain is focused on and a parallel cepstral and spectral (PCS) modeling is proposed. Both Mel-frequency spectral (MFS) and Mel-frequency cepstral (MFC) features are studied and experimented for speech enhancement. To estimate clean speech waveform from a noisy signal, an inversion from the Mel-frequency domain to the spectral domain is required which introduces distortion artifacts in the spectrum estimation and the filtering. To reduce the corrupting effects of the inversion, the PCS modeling is proposed. This method performs concurrent modeling in both cepstral and magnitude spectral...