Sharif Digital Repository / Sharif University of Technology / Search result

Speaker Verification using Limited Enrollment Data

, M.Sc. Thesis Sharif University of Technology Kalantari, Elaheh (Author) ; Sameti, Hossein (Supervisor)

Abstract

In this thesis, we investigate speaker verification as a biometric technology to verify a person based on his/her claim. Text-dependent speaker verification systems are preferred in commercial and security applications and these systems have better performance in limited data condition based on a prior knowledge about speakers that are assumed to be cooperative. Limited amount of enrollment data is a major concern in this thesis. Speaker dependent model construction and channel variability issues on telephone-based text-dependent speaker verification applications are surveyed. Due to the lack of an appropriate database for the task, we collected a database which is referred to as text-prompt...

محتواي کتاب

Speech Activity Detection Using Deep Networks

, M.Sc. Thesis Sharif University of Technology Shahsavari, Sajad (Author) ; Sameti, Hossein (Supervisor)

Abstract

In this paper, we introduce a new dataset for SAD and evaluate certain common methods such as GMM, ANN, and RNN on it. We have collected our dataset in a semi-supervised approach, using subtitled movies, with a labeling accuracy of 95%. This semi-automatic method can help us collect huge amounts of labeled audio data with very high diversity in language, speaker, and channel. We model the problem of SAD as a classification task to two classes of speech and non-speech. When using GMM for this problem, we use two separate mixtures to model speech and non-speech. In the case of neural networks, we use a softmax layer at the end of the network, with two neurons which represent speech and...

محتواي کتاب

Automatic Recognition of Quranic Maqams Using Machine Learning

, M.Sc. Thesis Sharif University of Technology Khodabandeh, Mohammad Javad (Author) ; Sameti, Hossein (Supervisor) ; Bahrani, Mohammad (Supervisor)

Abstract

Automatic recognition of musical Maqams has been one of the challenging problems in Music Information Retrieval. Despite the increasing amount of related research in recent years, we are still far away from building related real-life applications. Nevertheless, a very small portion of these research is dedicated to automatic recognition of Maqams in recitation of the Holy Quran. In this thesis, as a first attempt, we have used machine learning methods to classify six Maqam families which are commonly used in Quran recitation. Also, due to the lack of pre-exisiting datasets, we have annotated approximately 1325 minutes of Tadwir recitation from two prominent Egyptian reciters, i.e., Muhammad...

محتواي کتاب

Music Emotion Recognition

, M.Sc. Thesis Sharif University of Technology Pouyanfar, Samira (Author) ; Sameti, Hossein (Supervisor)

Abstract

Measuring emotions of music is one of the methods to determine music content. Music emotion detection is applicable in music retrieval, recognition of music genre and also music data management softwares. Music emotion is considered in different sciences such as physiology, psychology, musicology and engineering. First, we collected a database of different types of music with various emotions. These data have been labeled according to their emotions. In this project, four emotions (Angry, happy, relax and sad) have been used as labels based on Thayer’s two dimension emotion model. There are two basic steps for music emotion recognition similar to other recognition systems: Feature extraction...

محتواي پايان نامه

A Soft Spectrographic Mask Estimation for Speech Recognition

, M.Sc. Thesis Sharif University of Technology Esmaeelzadeh, Vahid (Author) ; Sameti, Hossein (Supervisor)

Abstract

Nowadays, robustness of the Automatic Speech Recognition (ASR) systems against various noises is major challenge in these systems. Missing feature speech recognition approaches are our goal in this thesis for achieving robust ASR systems. In these approaches, low SNR regions of a spectrogram are considered to be “missing” or “unreliable” and are removed from the spectrogram. Noise compensation is carried out by either estimating the missing regions from the remaining regions in some manner prior to recognition, or by performing recognition directly on incomplete spectrograms. These techniques clearly require a "spectrographic mask" which accurately labels the reliable and unreliable regions...

محتواي پايان نامه

Automatic Difficulty Estimation of Thematic Similarity MultipleChoice Questions

, M.Sc. Thesis Sharif University of Technology Akef, Soroosh (Author) ; Sameti, Hossein (Supervisor) ; Bokaei, Mohammad Hadi (Supervisor)

Abstract

This project has been conducted in two related phases: In the first phase, we have attempted to write a program capable of answering thematic similarity multiple-choice questions without utilizing any training data. The best performance in this phase was attained by the 25-topic LDA model using the Hellinger distance between the probability distributions of the poetic verses. This model managed to attain an accuracy of 42%, which is very close to the average human performance of 43%. In the second phase, two tasks of seven-class classification and binary classification were defined based on the p-value of the questions. To this end, the questions were initially ranked according to the...

محتواي کتاب

Sound Symbolism Analysis : A Motor Theory Approach

, M.Sc. Thesis Sharif University of Technology Sepehrifar, Makan (Author) ; Sameti, Hossein (Supervisor)

Abstract

“Motor Theory” of speech perception discusses the role of motor competence in speech perception. Motor Competence is accessed, simulating the gestures which lead to the heard voice signals. Simulating is done using mirror neurons which share similar feelings between sender and receiver. Empathy, caused by usage of mirror neurons, probably caused some sort of link between inward motor gestures and outward semantic categories at the origin of language. Some degree of linkage between sound and meaning is what we call “Sound Symbolism”. Along with describing the link between Motor Theory and Sound Symbolism, this thesis took two approaches to analyze...

محتواي پايان نامه

Semantic Analysis and Event Detection Using Deep Learning for Stock Prediction

, M.Sc. Thesis Sharif University of Technology Basirian Jahromi, Ali (Author) ; Sameti, Hossein (Supervisor) ; Bokaei, Mohammad Hadi (Supervisor)

Abstract

News plays a very important role in stock market trading. Nowadays news from a different part of the world and about different fields can be accessed easily, and for a successful trade, it is necessary to analyze accurately and use this big data and information as soon as possible. For this reason, this thesis tries to present and study models based on Deep Learning networks and Natural Language Processing for financial news analysis and predicting stock indices movement. This research takes advantage of a language model for learning and representing news text, and beside this language model it uses deep learning networks at multiple levels to extract proper features from each news in a day...

محتواي کتاب

Context-based Persian Grapheme-to-Phoneme Conversion using Sequence-to-Sequence Models

, M.Sc. Thesis Sharif University of Technology Rahmati, Elnaz (Author) ; Sameti, Hossein (Supervisor)

Abstract

Many Text-to-Speech (TTS) systems, particularly in low-resource environments, struggle to produce natural and intelligible speech from grapheme sequences. One solution to this problem is to use Grapheme-to-Phoneme (G2P) conversion to increase the information in the input sequence and improve the TTS output. However, current G2P systems are not accurate or efficient enough for Persian texts due to the language’s complexity and the lack of short vowels in Persian grapheme sequences. In our study, we aimed to improve resources for the Persian language. To achieve this, we introduced two new G2P training datasets, one manually-labeled and the other machine-generated, containing over five million...

محتواي کتاب

Grapheme to Phoneme Conversion using Deep Neural Networks

, M.Sc. Thesis Sharif University of Technology Safari, Arash (Author) ; Sameti, Hossein (Supervisor)

Abstract

The goal of this research is to convert letter to phoneme using deep neural networks. As the deep neural networks are among the best methods for speech and text processing (The highest accuracy in converting text to letter in English language is obtained by using deep neural networks too.), multilayer deep neural networks are used in this research to increase the accuracy. It should be noted that deep neural networks have not been used for converting text to phoneme in Persian language before. In this research a rule based alignment method based on our preposed rule is presented and achieved an accuracy more than 98%. Several approaches for converting word to grapheme with emphasis on the...

محتواي کتاب

Conversion of Persian Colloquial Texts into Official Texts using Unsupervised Learning Methods

, M.Sc. Thesis Sharif University of Technology Akhavan Azari, Karim (Author) ; Sameti, Hossein (Supervisor)

Abstract

Today, the production of colloquial texts in messengers, search engines, and question and answer systems has increased significantly, while text documents in other fields have a formal tone and style. Thus, there is a need for a system to convert these texts from colloquial form to the formal style. Attention to this need in non-Persian languages has also been recently and seriously felt, but almost at the time of writing, an efficient system has not been offered, and this issue requires more work in Persian than in languages such as English. In general, transferring texts from one form to another falls into the category of natural language processing applications and is called "style...

محتواي کتاب

Formality Style Transfer Using Deep Neural Network

, M.Sc. Thesis Sharif University of Technology Ebrahimi, Fatemeh (Author) ; Sameti, Hossein (Supervisor)

Abstract

Formality style transfer, in other words, automatic transfering style of informal text to formal and vice versa, means changing the style and form of a sentence without changing its content. With the increasing progress of deep neural networks, the formality style transfer in other languages has attracted the attention of other researchers and has made significant progress in natural language processing tasks. Due to the availability of parallel data in the English language, the task of style transfer has been approached and designed basically in the framework of the "encoder-decoder" architecture of neural networks. However, due to the lack of parallel datasets in the Persian language, this...

محتواي کتاب

Sequence-to-Sequence Voice Conversion Using Deep Learning

, M.Sc. Thesis Sharif University of Technology Shadbash, Hamed (Author) ; Sameti, Hossein (Supervisor)

Abstract

Apart from the content of the language that expresses the speaker's purpose and intent, human speech also contains other content, including other information such as the identity of the speaker, his or her gender and approximate age, the Intonation and mode of expression, the feeling of the speaker, the parts emphasized in the speech and so on. "Voice conversion" seeks to change the speaker-dependent content in an audio signal so that speaker-independent content (especially language content) remains unchanged. In other words, the purpose in voice conversion is to change the audio signal of speech created by one person in order to create the notion that the same speech was spoken by someone...

محتواي کتاب

Using Structural Language Modeling in Continous Speech Recognition Systems

, M.Sc. Thesis Sharif University of Technology SheikhShab, Golnar (Author) ; Sameti, Hossein (Supervisor)

Abstract

Language model is one of the most important parsts of an automated speech recognition system whiche incorporates the knowledge of Natural Language to the system to improve its accuracy. Conventionally used language model in recognition systems is ngram which usually is extracted from a large corpus using related frequency method. ngram model approximates the probability of a word sequence by multiplying its ngram probabilities and thus does not take into account the long distance dependencies. So, syntactic language models could be of interest. In this research after probing different syntactic language models, a mehtod for re-estimating ngram model, introduced by Stolcke in 1994, was...

محتواي پايان نامه

Pre-trained Model utilization Using Cross-lingual Methods

, M.Sc. Thesis Sharif University of Technology Hosseini, Mohammad (Author) ; Sameti, Hossein (Supervisor) ; Motahari, Abolfazl (Supervisor)

Abstract

Following dramatic changes after using deep learning method as a solution for Natural Language Processing tasks, Transformer architecture get popular. Based on that, then BERT Language model presented and get state-of-the-art as a solution for a lot of language processing tasks. It was a turning point in Natural Language Processing field. Also, in cross-lingual methods research line motivated by developing a common space for representation of language units, e.g. words, sentences, in more that one language, get some remarkable improvements. However, for languages distant from English such as Persian or Arabic the methods' performance was not clear. In this work, we performed some innovative...

محتواي کتاب

Using Partially-Observable Markov Decision Process for Dialogue Management in Spoken Dialogue Systems

, M.Sc. Thesis Sharif University of Technology Rahbar Noudehi, Siavash (Author) ; Sameti, Hossein (Supervisor)

Abstract

The use of Spoken Dialogue Systems is growing everyday and these systems will substitute current Iterative Voice Response systems in near future. A Spoken Dialogue System consists of Speech Recognition, Language Understanding, Dialogue Management, Speech Generation and Text to Speech Modules. Among these modules the only one that is specific part of Dialogue Systems is Dialogue Management. The responsibility of this part is to determine system behavior to maximize specific variables such as user goal finding accuracy and speed of finding the goal. There were different approaches to dialogue management in recent years the use of Partially-Observable Markov Decision Processes was very popular...

محتواي پايان نامه

Using Discriminative Training Approaches for Large Vocabulary Isolated Word Recognition

, M.Sc. Thesis Sharif University of Technology Osati, Majid (Author) ; Sameti, Hossein (Supervisor)

Abstract

In this study, isolated word recognition problem has been studied in large scale and different acoustic models are engaged to solve the problem. Acoustic models, based on discriminative training methods, are compared our proposed approach with other available training methods. Acoustic models are built and trained based on HMM-GMM, HMM- subspace GMM and HMM-DNN using different training criteria such as Maximum Mutual Information (MMI), boosted MMI, Minimum Phoneme Error (MPE), and state-level Minimum Bayesian Risk (sMBR). Using these discriminative approaches led to improvement of speech recognition systems. Boosted MMI with boosting factor of 0.3 for HMM-DNN has resulted in Word Error Rate...

محتواي کتاب

Speech Enhancement Based on Statistical Methods

, Ph.D. Dissertation Sharif University of Technology Veisi, Hadi (Author) ; Sameti, Hossein (Supervisor)

Abstract

Signle-channel speech enhancement using hidden Markov model (HMM) based on minimum mean square error (MMSE) estimator is focused on and an HMM-based speech enhancement in Mel-frequency domain is proposed. The MMSE estimator results in a weighted sum filtering of the noisy signal in which accurate estimation of the filter values and filter weights comprise the main challenges. The cepstral domain modeling for speech enhancement is motivated by accurate filter selection in this domain. In the propsed framework, Mel-frequency spectral (MFS) and Mel-frequency cepstral (MFC) features are studied and experimented. In addition to the spectrum estimator, magnitude spectrum, log-magnitude spectrum...

محتواي پايان نامه

Training-Based Speech Enhancement Using Non-Gaussian Distributions

, M.Sc. Thesis Sharif University of Technology Golrasan, Elham (Author) ; Sameti, Hossein (Supervisor)

Abstract

Statistical approaches (purely statistical and model-based) are the most efficient methods in single-channel speech enhancement. Despite these efficiencies, the problem of speech enhancement is still a challenge. Recent researches which propose univariate non-Gaussian distributions are more appropriate for speech signal in different domains. Based on these univariate distributions, statistical approaches have been modified and consequently better results have been reported. The purpose of this thesis is speech enhancement based on hidden Markov model using multivariate non-Gaussian distribution. The results of speech enhancement algorithm based on hidden Markov model in DCT and DFT domains...

محتواي کتاب

Speech Enhancement Using Deep Neural Networks

, M.Sc. Thesis Sharif University of Technology Mohammadian Kalkhoran, Parisa (Author) ; Sameti, Hossein (Supervisor)

Abstract

Quality and intelligibility are two aspects of speech that are affected by various factors, such as background noise and echo. The performance of many commercial and military speech-based systems depends on at least one of these aspects of speech. Therefore, this research aims to design an improvement model to remove background noise and reverberation from the speech signal. The model training framework is based on deep learning methods and has a supervised approach in the time domain. The input of this system is the raw waveform of the speech signal mixed with noise and reverberation, and the output is the enhanced waveform of this signal. An architecture is proposed in this thesis based on...

محتواي کتاب