Loading...
Search for: speech-recognition
0.015 seconds
Total 131 records

    Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science (M.Sc.) in Computer Engineering, Artificial Intelligence

    , M.Sc. Thesis Sharif University of Technology Hosseini, Mohammad Saleh (Author) ; Sameti, Hossein (Supervisor)
    Abstract
    Punctuation marks in every language, constitute an important part of a text. Not inserting these punctuations in text, makes the text ambiguous. The output text of automatic speech recognition (ASR) system, is typically a raw sequence of words, containing no punctuation marks. This makes the text difficult or even impossible to make sense of for humans, as well as for any further text processing tasks. The goal of this thesis is to perform automatic punctuation insertion in Persian texts lacking punctuation marks. To the best of our knowledge, this is the first work done in this context for the Persian language. For this purpose, firstly, we assembled a state-of-the-art corpus to train and... 

    Design and Improvement of Sequence-level Objective Functions for DNN-based Large Vocabulary Continuous Speech Recognition

    , Ph.D. Dissertation Sharif University of Technology Hadian, Hossein (Author) ; Sameti, Hossein (Supervisor)
    Abstract
    This thesis focuses on the problem of large vocabulary continuous speech recognition (LVCSR).Numerous research results in recent years proved effectiveness of deep neural networks (DNN) for LVCSR. As a result, many methods were proposed to incorporate DNNs in LVCSR. From one perspective we can look at these methods from the viewpoint of objective functions used for training DNNs. A frame-level objective function is one that is defined on frames locally, whereas a sequence-level objective function is defined on whole sequences. Since speech recognition is essentially a sequentional problem, here we focus on designing and imroving sequencelevel objective functions for DNNs. The main proposed... 

    Using Audio Speech Recognition Techniques in Augmented Reality Environment

    , M.Sc. Thesis Sharif University of Technology Mirzaei, Mohammad Reza (Author) ; Ghorshi, Alireza (Supervisor) ; Mortazavi, Mohammad (Supervisor)
    Abstract
    Recently, many studies show that Augmented Reality (AR) and Automatic Speech Recognition (ASR) can help people with disabilities. In this thesis we examine the ability of combining AR and ASR technologies to implement a new system for helping deaf people. This system can instantly take a narrator's speech and convert it into a readable text and show it directly on AR display. Also, with this system, people do not need to learn sign-language to communicate with deaf people. To improve the accuracy of the system, we use Audio-Visual Speech Recognition (AVSR) as a backup for the ASR engine in noisy environments. AVSR is one of the advances in ASR technology that combines audio, video and facial... 

    A Soft Spectrographic Mask Estimation for Speech Recognition

    , M.Sc. Thesis Sharif University of Technology Esmaeelzadeh, Vahid (Author) ; Sameti, Hossein (Supervisor)
    Abstract
    Nowadays, robustness of the Automatic Speech Recognition (ASR) systems against various noises is major challenge in these systems. Missing feature speech recognition approaches are our goal in this thesis for achieving robust ASR systems. In these approaches, low SNR regions of a spectrogram are considered to be “missing” or “unreliable” and are removed from the spectrogram. Noise compensation is carried out by either estimating the missing regions from the remaining regions in some manner prior to recognition, or by performing recognition directly on incomplete spectrograms. These techniques clearly require a "spectrographic mask" which accurately labels the reliable and unreliable regions... 

    Robust Speech Recognition Based on Data Compensation and MDT Methods

    , M.Sc. Thesis Sharif University of Technology BabaAli, Bagher (Author) ; Sameti, Hossein (Supervisor)
    Abstract
    Automatic speech recognition performance degrades significantly when speech is affected by environmental noise. Nowadays, the major challenge is to achieve good robustness in adverse noisy conditions so that automatic speech recognizers can be used in real situations. Spectral subtraction (SS) is a well-known and effective approach; it was originally designed for improving the quality of speech signal judged by human listeners. SS techniques usually improve the quality and intelligibility of speech signal while speech recognition systems need compensation techniques to reduce mismatch between noisy speech features and clean trained acoustic model. Nevertheless, correlation can be expected... 

    Automatic Speech Recognition System for Pilot-Air Traffic Service Units Communications

    , M.Sc. Thesis Sharif University of Technology Azadmanesh, Mahsa (Author) ; Bahrani, Mohammad (Supervisor) ; Baba Ali, Bagher (Co-Advisor) ; Pazooki, Farshad (Co-Advisor)
    Abstract
    Currently, in the Islamic Republic of Iran, after aviation accidents and incidents, conversations between pilots and air traffic controllers are re-examined by the State Air Transport Organization of the Islamic Republic of Iran and turned into text. The Automatic Recognition System for Pilot-Air Traffic Service Units’ Communication helps in the implementation of speech recognition. Reducing the time and cost of converting conversations into texts and creating an aviation database in the country are other uses of this system. In this research, after collecting and refining the actual conversation between pilots and air traffic controllers and examining seven methods, we design a system that... 

    On the Use of Artificial Neural Networks in Automatic Speech Recognition

    , M.Sc. Thesis Sharif University of Technology Hassani, Adel (Author) ; Ghorshi, Mohammad Ali (Supervisor) ; Khayyat, Amir Ali Akbar (Supervisor)
    Abstract
    In this thesis, the Artificial Neural Networks (ANN) will be used in Automatic Speech Recognition (ASR) instead of Hidden Markov Models (HMM). Hidden Markov Model is one of the most dominant Bayesian network technologies and is the most successful model in current ASR systems. However, excessive training time is a major issue in speech recognition based on Hidden Markov Model (HMM). This thesis presents an Artificial Neural Network language model for human speech by mapping the spectral features of speech namely the formants, cepstrum (Mel-Frequency Cepstral Coefficients (MFCCs)) and Power Spectral Density (PSD) as features of samples of specific words into a discrete vector space. The... 

    Language Modeling Using Recurrent Neural Networks

    , M.Sc. Thesis Sharif University of Technology Rahimi, Adel (Author) ; Sameti, Hossein (Supervisor)
    Abstract
    This thesis examines the differences and the similarities between the two famous RNN blocks the Long Short Term Memory and the Gated Recurrent Unit. It measure different aspects such as computational complexity, Word Error Rate, and subjective human evaluation in the task of text generation.In the computational complexity experiment results show that the LSTM takes more time to compute, in comparison to the GRU. Moving on into the next experiment the GRU slightly outperforms the LSTM in terms of WER but the perplexity for the language models tested was the same. This shows that slight differences in the perplexity does not drastically change the WER. Having said, the results suggest that the... 

    Persian End-To-End Speech Recognition

    , M.Sc. Thesis Sharif University of Technology Hajipour Ghomi, Farzaneh (Author) ; Sameti, Hossein (Supervisor)
    Abstract
    This thesis provids a Persian End-To-End Speech Recognition system. In this system, the input is low-level features of speech signal. Deep recurrent neural networks (RNNs) and Long Short-Term Memory (LSTM) units as the RNN building blocks are used as the acoustic model. Continuous speech data is labeled by the CTC which is applied as the output layer of a recurrent neural network. By using the CTC objective function, acoustic modeling problem is simplified to just an RNN learning problem over pairs of speech and context-independent (CI) label sequences. A distinctive feature of this system is a generalized decoding approach based on weighted finite-state transducers (WFSTs), which enables... 

    Using Discriminative Training Approaches for Large Vocabulary Isolated Word Recognition

    , M.Sc. Thesis Sharif University of Technology Osati, Majid (Author) ; Sameti, Hossein (Supervisor)
    Abstract
    In this study, isolated word recognition problem has been studied in large scale and different acoustic models are engaged to solve the problem. Acoustic models, based on discriminative training methods, are compared our proposed approach with other available training methods. Acoustic models are built and trained based on HMM-GMM, HMM- subspace GMM and HMM-DNN using different training criteria such as Maximum Mutual Information (MMI), boosted MMI, Minimum Phoneme Error (MPE), and state-level Minimum Bayesian Risk (sMBR). Using these discriminative approaches led to improvement of speech recognition systems. Boosted MMI with boosting factor of 0.3 for HMM-DNN has resulted in Word Error Rate... 

    Language Modeling for Persian using Recurrent Neural Networks

    , M.Sc. Thesis Sharif University of Technology Pourbagheri, Mohammad (Author) ; Sameti, Hossein (Supervisor)
    Abstract
    During recent years, neural networks have been used for language modeling in tasks related to natural language processing. In these models, various structures of neural networks have been used, and recurrent networks (RNN) have achieved good results in these tasks. Since RNNs are not limited to a fixed number of words for predicting next word, they have achieved better results than feedforward networks. However, these networks have problems to learn long sequences, and long short-term memory (LSTM) networks have been presented for solving this problem. In this research, language models are extracted for Persian language using RNN and LSTM, and are compared with n-gram-based models. For... 

    Discriminative Articulatory Models for Spoken Term Detection in Low-Resource Conditions

    , M.Sc. Thesis Sharif University of Technology Gomar, Zahra (Author) ; Sameti, Hossein (Supervisor)
    Abstract
    This thesis is focused on the spoken term detection system based on speech recognition in low resources conditions. A spoken term detection system is composed of two parts: speech recognition and search. In search of words, the method of proxy words is used as a basic approache to overcome the problem of OOV words. The main challenge in this thesis in the context of low resources, is poor training acoustic and language models and the small lexicon in speech recognition. Small lexicon increases the number of OOV words. In this thesis, two innovation has been proposed to improve the basic system. The first is training a bottleneck neural network for extraction the articulatory features of... 

    Concept Extraction of Sequential Patterns for Imitative Learning

    , M.Sc. Thesis Sharif University of Technology Arjomand Aghaee, Ehsan (Author) ; Bagheri Shouraki, Saeed (Supervisor)
    Abstract
    The aim of this thesis is the concept extraction of sequential patterns for imitative learning for humanoid robots. In such a way that an existent that has the physical and cognitive similarities, begins to extract concepts and learns by observing the behavior of the other existent. In this project, it is assumed a humanoid robot that can understand the concepts such as hello, goodbye and different concepts and does the corresponding actions from the visual and auditory information. In this thesis, a new model has been presented to eliminate the improper and meaningless elasticity in patterns sequence, such as changes in accent or elasticity in movements. This model is called the fuzzy... 

    A Speech Driven Web Browser

    , M.Sc. Thesis Sharif University of Technology Rashidi Fard, Amin (Author) ; Vosoughi Vahdat, Bijan (Supervisor)
    Abstract
    Generally speaking a web browser is a software application for surfing the World Wide Web. A user with web browser can request some web pages on the Internet. This request would be sent to web server and would be analyzed. The result would be shown to end user by web browser GUI. A web browser has different parts such as HTML parser, Renderer, browser engine and GUI. The GUI is one of the most important parts of each web browser, because the end users interact with GUI. The classical GUI for surfing has been used in various platforms, such as the PC and Laptop Operating systems. Because of the technological advances and the introduction of tablets and other touch screen devices i. e, smart... 

    Computation of Confidence Measure for Detection of out of Vocabulary Words in a Continuous Speech Recognition System

    , M.Sc. Thesis Sharif University of Technology Sakhaee, Elham (Author) ; Samti, Hossein (Supervisor)
    Abstract
    Automatic Speech Recognition (ASR) engines are too much sensitive to conditions such as noise, transmission line quality, etc. Thus in any real-world application ASR systems should be able to automatically assess reliability or probability of correctness for every decision made by the systems. One technique to increase intelligence of an ASR system is to compute a score, called “confidence measure” to indicate reliability of any recognition decision made by the system. This score can be computed at any required level such as phonemes, syllabi, words or even the whole utterance. Thus a robust and accurate confidence measure results in better detection of recognition errors, Out of Vocabulary... 

    Automatic Concept Extraction to Improve the Recognition Performance for Sequential Patterns

    , Ph.D. Dissertation Sharif University of Technology Halavati, Ramin (Author) ; Bagheri Shouraki, Saeid (Supervisor)
    Abstract
    In this dissertation, we introduced a Fuzzy based representation and comparison method for sequential patterns such as speech and online handwriting. The new model, called Fuzzy Elastic Matching Machine (FEMM), is simpler than traditional HMM based approaches and is not based on the common statistical assumptions of HMM systems. The model was tested on isolated word and phoneme recognition tasks in speech recognition domain and isolated letter recognition in Persian handwriting recognition. We showed that this method is faster than traditional HMM based models and more robust to noise. To train the model, we introduced a Symbiogenesis-based evolutionary training algorithm. This algorithm... 

    Using Structural Language Modeling in Continous Speech Recognition Systems

    , M.Sc. Thesis Sharif University of Technology SheikhShab, Golnar (Author) ; Sameti, Hossein (Supervisor)
    Abstract
    Language model is one of the most important parsts of an automated speech recognition system whiche incorporates the knowledge of Natural Language to the system to improve its accuracy. Conventionally used language model in recognition systems is ngram which usually is extracted from a large corpus using related frequency method. ngram model approximates the probability of a word sequence by multiplying its ngram probabilities and thus does not take into account the long distance dependencies. So, syntactic language models could be of interest. In this research after probing different syntactic language models, a mehtod for re-estimating ngram model, introduced by Stolcke in 1994, was... 

    Design and Performance Improvement of a Spoken Term Detection System

    , M.Sc. Thesis Sharif University of Technology Ghadirinia, Marzieh (Author) ; Sameti, Hossein (Supervisor)
    Abstract
    Recently, widely application of video and radio data makes the exploiting an efficient speech information retrival systems highly crucial. In the present work, Our focus is on spoken term detection which is one of the most important approaches for information retrival. The present system is including two main steps: first, speech processing by means of automatic speech recognition. In recognition Step, we apply large vocabulary. In all recent approaches, the main concern is to retrieve words which are out of vocabulary (OOV). The state of the art to tackle the problem is to exploit the proxy kewords which are in vocabulary words and could be recognized instead of OOV words. Such proxies have... 

    Pronunciation Scoring in Computer-Assisted Language Learning

    , M.Sc. Thesis Sharif University of Technology Mohammadi, Sajede (Author) ; Sameti, Hossein (Supervisor)
    Abstract
    Due to the increase in the number of people interested in learning new languages, in recent years, multiple systems have been developed to teach new languages to those who are interested. These systems are called Computer Assisted Language Learning (CALL). However, the most credible CALL systems, like Duolingo, do not support Persian. So the of this study is to design and implement one of the technical parts of CALL systems, the Computer Assisted Pronunciation Training(CAPT), which is the part responsible for evaluating the learners' input voice's pronunciation and generating appropriate score and feedback.In this study, good pronunciation means correct expression of words, correct... 

    Deep Learning for Speech Recognition

    , M.Sc. Thesis Sharif University of Technology Azadi Yazdi, Saman (Author) ; Sameti, Hossein (Supervisor)
    Abstract
    Speech recognition is one of the first goals of speech processing. Our goal in this thesis is to use deep learning for speech recognition. In recent years little improvement of speech recognition accuracies are reported. Deep learning is a new learning algorithm that results in improvement in many machine learning tasks. Following improvements reported in speech recognition in English language by deep learning, in this thesis we tried to improve accuracy over common and new recognition methods for Persian language.
    First the overall structure of a typical speech recognition system is introduced. For this purpose, the modules of a speech recognition system are introduced. Deep multilayer...