Sharif Digital Repository / Sharif University of Technology / Search result

Music Track Detection Using Audio Fingerprinting

, M.Sc. Thesis Sharif University of Technology Yazdanian, Saeed (Author) ; Sameti, Hossein (Supervisor)

Abstract

Music information reterival systems have a lot of applications in music filtering and broadcast monitoring due to the huge amount of multimedia data these days. In these systems the feature extraction method is called audio fingerprinting. Small size of fingerprints allows the systems to search efficiently in thousands or millions numbers of audio songs. The input signal is usually just a couple of seconds long and degraded in several ways. The goal is to design a system which is robust to signal degradations and efficient to search. In this thesis one of the basic systems is reviewed and improved in several ways. This system uses spectrogram of signals to extract features and build an...

محتواي کتاب

Discriminative Articulatory Models for Spoken Term Detection in Low-Resource Conditions

, M.Sc. Thesis Sharif University of Technology Gomar, Zahra (Author) ; Sameti, Hossein (Supervisor)

Abstract

This thesis is focused on the spoken term detection system based on speech recognition in low resources conditions. A spoken term detection system is composed of two parts: speech recognition and search. In search of words, the method of proxy words is used as a basic approache to overcome the problem of OOV words. The main challenge in this thesis in the context of low resources, is poor training acoustic and language models and the small lexicon in speech recognition. Small lexicon increases the number of OOV words. In this thesis, two innovation has been proposed to improve the basic system. The first is training a bottleneck neural network for extraction the articulatory features of...

محتواي کتاب

Training-Based Speech Enhancement Using Non-Gaussian Distributions

, M.Sc. Thesis Sharif University of Technology Golrasan, Elham (Author) ; Sameti, Hossein (Supervisor)

Abstract

Statistical approaches (purely statistical and model-based) are the most efficient methods in single-channel speech enhancement. Despite these efficiencies, the problem of speech enhancement is still a challenge. Recent researches which propose univariate non-Gaussian distributions are more appropriate for speech signal in different domains. Based on these univariate distributions, statistical approaches have been modified and consequently better results have been reported. The purpose of this thesis is speech enhancement based on hidden Markov model using multivariate non-Gaussian distribution. The results of speech enhancement algorithm based on hidden Markov model in DCT and DFT domains...

محتواي کتاب

Speaker Verification using Limited Enrollment Data

, M.Sc. Thesis Sharif University of Technology Kalantari, Elaheh (Author) ; Sameti, Hossein (Supervisor)

Abstract

In this thesis, we investigate speaker verification as a biometric technology to verify a person based on his/her claim. Text-dependent speaker verification systems are preferred in commercial and security applications and these systems have better performance in limited data condition based on a prior knowledge about speakers that are assumed to be cooperative. Limited amount of enrollment data is a major concern in this thesis. Speaker dependent model construction and channel variability issues on telephone-based text-dependent speaker verification applications are surveyed. Due to the lack of an appropriate database for the task, we collected a database which is referred to as text-prompt...

محتواي کتاب

User Profiling in Social Networks

, M.Sc. Thesis Sharif University of Technology Ketabchi, Mohammad Amin (Author) ; Sameti, Hossein (Supervisor)

Abstract

Due to the emergence of social networks in recent years and people’s usage of them for expressing their thoughts and emotions, there are lots of user data in these networks. The development of social networks has created a good opportunity for organizations and people to extract user profiles from social networks. Hence, user profiling has become an interesting problem for researchers. Predicting users’ occupational class is one of the main problems in this field. Most of the existing related works use only textual features of users, whereas users’ relations graph can give useful information about users. In this research, we propose a model based on Graph Neural Networks (GNNs) to predict...

محتواي کتاب

Pattern Based Relation Extraction on Presian News Articles

, M.Sc. Thesis Sharif University of Technology Cholmaghani Qaheh, Ali (Author) ; Bahrani, Mohammad (Supervisor) ; Sameti, Hossein (Co-Advisor)

Abstract

Relation extraction is known as a main task in information extraction. There are two main approach in this field, rule based and statistical approaches. This thesis applied a rule based relation extraction approach. In this research we tried to recognize Persian syntactic and morphological patterns to extract relation between named entities. At first we annotated a news dataset by person,organization and location named entity tags which is included more than 100 thousand tokens. After that we found there are 1037 relations 2197 candidate relations. Candidate and labled relations extracted between two entities which is located in a clause. These relations are "PERS_PERS-COMMENTING",...

محتواي کتاب

Music Emotion Recognition

, M.Sc. Thesis Sharif University of Technology Pouyanfar, Samira (Author) ; Sameti, Hossein (Supervisor)

Abstract

Measuring emotions of music is one of the methods to determine music content. Music emotion detection is applicable in music retrieval, recognition of music genre and also music data management softwares. Music emotion is considered in different sciences such as physiology, psychology, musicology and engineering. First, we collected a database of different types of music with various emotions. These data have been labeled according to their emotions. In this project, four emotions (Angry, happy, relax and sad) have been used as labels based on Thayer’s two dimension emotion model. There are two basic steps for music emotion recognition similar to other recognition systems: Feature extraction...

محتواي پايان نامه

Language Modeling for Persian using Recurrent Neural Networks

, M.Sc. Thesis Sharif University of Technology Pourbagheri, Mohammad (Author) ; Sameti, Hossein (Supervisor)

Abstract

During recent years, neural networks have been used for language modeling in tasks related to natural language processing. In these models, various structures of neural networks have been used, and recurrent networks (RNN) have achieved good results in these tasks. Since RNNs are not limited to a fixed number of words for predicting next word, they have achieved better results than feedforward networks. However, these networks have problems to learn long sequences, and long short-term memory (LSTM) networks have been presented for solving this problem. In this research, language models are extracted for Persian language using RNN and LSTM, and are compared with n-gram-based models. For...

محتواي کتاب

Speech Enhancement Based on Statistical Methods

, Ph.D. Dissertation Sharif University of Technology Veisi, Hadi (Author) ; Sameti, Hossein (Supervisor)

Abstract

Signle-channel speech enhancement using hidden Markov model (HMM) based on minimum mean square error (MMSE) estimator is focused on and an HMM-based speech enhancement in Mel-frequency domain is proposed. The MMSE estimator results in a weighted sum filtering of the noisy signal in which accurate estimation of the filter values and filter weights comprise the main challenges. The cepstral domain modeling for speech enhancement is motivated by accurate filter selection in this domain. In the propsed framework, Mel-frequency spectral (MFS) and Mel-frequency cepstral (MFC) features are studied and experimented. In addition to the spectrum estimator, magnitude spectrum, log-magnitude spectrum...

محتواي پايان نامه

Markov Logic Networks for Persian Spoken Language Understanding

, M.Sc. Thesis Sharif University of Technology Hemmatan Attarbashi, Ensieh (Author) ; Bahrani, Mohammad (Supervisor) ; Khosravizadeh, Parvaneh (Co-Advisor) ; Sameti, Hossein (Co-Advisor)

Abstract

Spoken Language Understanding (SLU) is aimed at extracting meaning from natural spoken language. Meaning extraction ranges from "extracting specific phrases" to "extracting users' intentions from their speech" and goes as far as "extracting the entities and details of their intentions". Extracting the exact intended meaning of the user is a sophisticated process. In this research, considering the lack of standard data in Persian, an SLU system for this language has been implemented using Markov Logic Networks (MLNs), in order to reduce the need for extra datasets. MLNs combine the explanatory power and orderliness of First-Order Logic with the uncertainty of probabilities. Therefore, these...

محتواي کتاب

Design and Improvement of Sequence-level Objective Functions for DNN-based Large Vocabulary Continuous Speech Recognition

, Ph.D. Dissertation Sharif University of Technology Hadian, Hossein (Author) ; Sameti, Hossein (Supervisor)

Abstract

This thesis focuses on the problem of large vocabulary continuous speech recognition (LVCSR).Numerous research results in recent years proved effectiveness of deep neural networks (DNN) for LVCSR. As a result, many methods were proposed to incorporate DNNs in LVCSR. From one perspective we can look at these methods from the viewpoint of objective functions used for training DNNs. A frame-level objective function is one that is defined on frames locally, whereas a sequence-level objective function is defined on whole sequences. Since speech recognition is essentially a sequentional problem, here we focus on designing and imroving sequencelevel objective functions for DNNs. The main proposed...

محتواي کتاب

Improving the Training Process of Understanding Unit in Spoken Dialog Systems Using Active Learning Methods

, M.Sc. Thesis Sharif University of Technology Hadian, Hossein (Author) ; Sameti, Hossein (Supervisor)

Abstract

This thesis aims at reducing the need for labeled data in the SLU domain by the means of active Learning methods. This need is due to the lack of labeled datasets for Spoken Language Understanding (SLU) in the Persian language, and fairly high labeling costs. Active learning methods enables the learner to choose the most informative instances to be labeled and used for training, and prevents labeling uninformative or redundant instances. For modeling the SLU system, several statistical models namely MLN (Markov Logic Networks), CRF (Conditional Random Fields), HMM (Hidden Markov Model) and HVS (Hidden Vector State) were reviewed, and finally CRF was chosen for its superior performance. The...

محتواي کتاب

Telephony Text-Independent Speaker Verification in Total Variability Space

, M.Sc. Thesis Sharif University of Technology Mirian, Alireza (Author) ; Sameti, Hossein (Supervisor)

Abstract

Given two speech segments, the task of speaker verification is defined as determining whether or not both of them have been uttered by the same person. Most of the new approaches in speaker verification are based on Total Variability Space which is the result of applying a factor analysis on GMM mean supervector space. The representation of speech with arbitrary duration in this space is called i-vector.
In this thesis, first the basics of speaker verification is described and i-vector approaches are explained in more details. Then, a method for improving accuracy of Cosine Similarity Scoring is proposed which normalize the raw score using the score of test utterance against a model- and...

محتواي کتاب

Uncertainty Reduction in Speaker Verification with Short Duration Utterances

, Ph.D. Dissertation Sharif University of Technology Maghsoodi, Nooshin (Author) ; Sameti, Hossein (Supervisor)

Abstract

The voice biometric is used in today’s telephone based speaker verification because of its unique feature for remote access. However, there are significant challenges in implementing such systems. One of these challenges is the need for sufficient data in the enrollment phase. In fact, the speaker verification system needs a dataset that covers phonetic variations of the language to be able to discriminate between different speakers. In real applications it’s not easy to ask the speakers to say long utterances. Therefore, an ideal speaker verification system should be able to find imposters without any constraint on the input lexicon whether the utterances are long or short.The results of...

محتواي کتاب

Speech Enhancement Using Deep Neural Networks

, M.Sc. Thesis Sharif University of Technology Mohammadian Kalkhoran, Parisa (Author) ; Sameti, Hossein (Supervisor)

Abstract

Quality and intelligibility are two aspects of speech that are affected by various factors, such as background noise and echo. The performance of many commercial and military speech-based systems depends on at least one of these aspects of speech. Therefore, this research aims to design an improvement model to remove background noise and reverberation from the speech signal. The model training framework is based on deep learning methods and has a supervised approach in the time domain. The input of this system is the raw waveform of the speech signal mixed with noise and reverberation, and the output is the enhanced waveform of this signal. An architecture is proposed in this thesis based on...

محتواي کتاب

Pronunciation Scoring in Computer-Assisted Language Learning

, M.Sc. Thesis Sharif University of Technology Mohammadi, Sajede (Author) ; Sameti, Hossein (Supervisor)

Abstract

Due to the increase in the number of people interested in learning new languages, in recent years, multiple systems have been developed to teach new languages to those who are interested. These systems are called Computer Assisted Language Learning (CALL). However, the most credible CALL systems, like Duolingo, do not support Persian. So the of this study is to design and implement one of the technical parts of CALL systems, the Computer Assisted Pronunciation Training(CAPT), which is the part responsible for evaluating the learners' input voice's pronunciation and generating appropriate score and feedback.In this study, good pronunciation means correct expression of words, correct...

محتواي کتاب

Speaker Diarization in Adverse Conditions

, M.Sc. Thesis Sharif University of Technology Mohammadi, Hamid Reza (Author) ; Sameti, Hossein (Supervisor)

Abstract

The goal of a speaker diarization system is to detect the number of speakers of a conversation and also assign each segment of the conversation to one of the speakers. In these types of systems it is assumed that the identity of the speakers is completely unknown. Usually speaker diarization systems operate in an offline mode. The system assumes that it does have the whole conversation at hand and then it starts processing the conversation. This method is effective for applications like spoken document retrieval, but it is not applicable to speech/speaker recognition systems which require online operating. In this dissertation, an online speaker diarization system is implemented. This...

محتواي پايان نامه

Robust Speaker Verification in Total Variability Space

, M.Sc. Thesis Sharif University of Technology La’l Mohammadi, Mahnoosh (Author) ; Sameti, Hossein (Supervisor)

Abstract

Our study is mainly related to speaker verification systems. Given a speech segment and a claimed identity, these systems must decide whether the claimant is admissible or a fraud. Our main focus in on making the speaker verification system robust in case of limited training data. When there is limited training data, the accuracy of speaker verification systems reduces drastically. Our main purpose is to review this problem deeply and to represent methods in order to encounter this challenge. Recently, some methods such as PLDA and distance metric learning have been applied in text-independent speaker verification in order to encounter limited data crisis. One of the important cases in which...

محتواي کتاب

High-Performance Keyword Spotting System for Persian Language

, M.Sc. Thesis Sharif University of Technology Ghorbani, Shahram (Author) ; Sameti, Hossein (Supervisor)

Abstract

Keyword spotting with high speed and accuracy is an important subject whithin speech processing domain especially when we are dealing with various transmission channels. In this research discriminative keyword spotting methods are compared with HMM-based approaches. We have employed the discriminative approaches as our baseline methods due to their higher accuracy. The drawback of the conventional discriminative methods is their high computation cost and long execution time. The discriminative approach consists of two steps: feature extraction and classification. We have proposed four ideas to improve the performance of the baseline method. To improve the speed of the process, in feature...

محتواي کتاب

A Persian Dialog System with Sequence to Sequence Learning

, M.Sc. Thesis Sharif University of Technology Ghafourian, Mohammad (Author) ; Sameti, Hossein (Supervisor)

Abstract

Conversation modeling is one of the most important goals in the field of understanding natural language and machine intelligence. Recently, with the enormous growth of the Internet and social networks, the amount of available data on the Web has increased significantly.This makes it possible to use data-driven approaches to solve the modeling problem of conversation.One of the most recent data-driven methods is the sequence to sequence modeling. In this document, after providing the necessary prerequisites, we examined the various models that have used the sequence to sequence approach for conversation modeling. We further examined the ways of improving the efficiency of this modeling...

محتواي کتاب