Sharif Digital Repository / Sharif University of Technology / Search result

Speech Activity Detection Using Deep Networks

, M.Sc. Thesis Sharif University of Technology Shahsavari, Sajad (Author) ; Sameti, Hossein (Supervisor)

Abstract

In this paper, we introduce a new dataset for SAD and evaluate certain common methods such as GMM, ANN, and RNN on it. We have collected our dataset in a semi-supervised approach, using subtitled movies, with a labeling accuracy of 95%. This semi-automatic method can help us collect huge amounts of labeled audio data with very high diversity in language, speaker, and channel. We model the problem of SAD as a classification task to two classes of speech and non-speech. When using GMM for this problem, we use two separate mixtures to model speech and non-speech. In the case of neural networks, we use a softmax layer at the end of the network, with two neurons which represent speech and...

محتواي کتاب

Discriminative Articulatory Models for Spoken Term Detection in Low-Resource Conditions

, M.Sc. Thesis Sharif University of Technology Gomar, Zahra (Author) ; Sameti, Hossein (Supervisor)

Abstract

This thesis is focused on the spoken term detection system based on speech recognition in low resources conditions. A spoken term detection system is composed of two parts: speech recognition and search. In search of words, the method of proxy words is used as a basic approache to overcome the problem of OOV words. The main challenge in this thesis in the context of low resources, is poor training acoustic and language models and the small lexicon in speech recognition. Small lexicon increases the number of OOV words. In this thesis, two innovation has been proposed to improve the basic system. The first is training a bottleneck neural network for extraction the articulatory features of...

محتواي کتاب

Speech Enhancement Based on Statistical Methods

, Ph.D. Dissertation Sharif University of Technology Veisi, Hadi (Author) ; Sameti, Hossein (Supervisor)

Abstract

Signle-channel speech enhancement using hidden Markov model (HMM) based on minimum mean square error (MMSE) estimator is focused on and an HMM-based speech enhancement in Mel-frequency domain is proposed. The MMSE estimator results in a weighted sum filtering of the noisy signal in which accurate estimation of the filter values and filter weights comprise the main challenges. The cepstral domain modeling for speech enhancement is motivated by accurate filter selection in this domain. In the propsed framework, Mel-frequency spectral (MFS) and Mel-frequency cepstral (MFC) features are studied and experimented. In addition to the spectrum estimator, magnitude spectrum, log-magnitude spectrum...

محتواي پايان نامه

Using Structural Language Modeling in Continous Speech Recognition Systems

, M.Sc. Thesis Sharif University of Technology SheikhShab, Golnar (Author) ; Sameti, Hossein (Supervisor)

Abstract

Language model is one of the most important parsts of an automated speech recognition system whiche incorporates the knowledge of Natural Language to the system to improve its accuracy. Conventionally used language model in recognition systems is ngram which usually is extracted from a large corpus using related frequency method. ngram model approximates the probability of a word sequence by multiplying its ngram probabilities and thus does not take into account the long distance dependencies. So, syntactic language models could be of interest. In this research after probing different syntactic language models, a mehtod for re-estimating ngram model, introduced by Stolcke in 1994, was...

محتواي پايان نامه

Design and Performance Improvement of a Spoken Term Detection System

, M.Sc. Thesis Sharif University of Technology Ghadirinia, Marzieh (Author) ; Sameti, Hossein (Supervisor)

Abstract

Recently, widely application of video and radio data makes the exploiting an efficient speech information retrival systems highly crucial. In the present work, Our focus is on spoken term detection which is one of the most important approaches for information retrival. The present system is including two main steps: first, speech processing by means of automatic speech recognition. In recognition Step, we apply large vocabulary. In all recent approaches, the main concern is to retrieve words which are out of vocabulary (OOV). The state of the art to tackle the problem is to exploit the proxy kewords which are in vocabulary words and could be recognized instead of OOV words. Such proxies have...

محتواي کتاب

Uncertainty Reduction in Speaker Verification with Short Duration Utterances

, Ph.D. Dissertation Sharif University of Technology Maghsoodi, Nooshin (Author) ; Sameti, Hossein (Supervisor)

Abstract

The voice biometric is used in today’s telephone based speaker verification because of its unique feature for remote access. However, there are significant challenges in implementing such systems. One of these challenges is the need for sufficient data in the enrollment phase. In fact, the speaker verification system needs a dataset that covers phonetic variations of the language to be able to discriminate between different speakers. In real applications it’s not easy to ask the speakers to say long utterances. Therefore, an ideal speaker verification system should be able to find imposters without any constraint on the input lexicon whether the utterances are long or short.The results of...

محتواي کتاب

Personal Name Disambiguation in Persian Written News

, M.Sc. Thesis Sharif University of Technology Saneei, Sara (Author) ; Sameti, Hossein (Supervisor)

Abstract

Diverse personal names are mentioned in everyday news but news agencies do not separate entities with same or equal names. This could make irrelevant news appear while searching an ambiguous name. Personal Name Disambiguation in news seeks to partition a significant amount of news to distinct classes each of which belongs to a single entity in the real world. In this thesis, which up to the researcher is the first of its kind at least in Persian, researcher gained opportunity of using FarsiYar News Dataset and to be specific 50,000 of news in FarsNews dataset which were published in the year 1397. First of all, a database was built using these news data and then the nonstructured news were...

محتواي کتاب

User Profiling in Social Networks

, M.Sc. Thesis Sharif University of Technology Ketabchi, Mohammad Amin (Author) ; Sameti, Hossein (Supervisor)

Abstract

Due to the emergence of social networks in recent years and people’s usage of them for expressing their thoughts and emotions, there are lots of user data in these networks. The development of social networks has created a good opportunity for organizations and people to extract user profiles from social networks. Hence, user profiling has become an interesting problem for researchers. Predicting users’ occupational class is one of the main problems in this field. Most of the existing related works use only textual features of users, whereas users’ relations graph can give useful information about users. In this research, we propose a model based on Graph Neural Networks (GNNs) to predict...

محتواي کتاب

Formality Style Transfer Using Deep Neural Network

, M.Sc. Thesis Sharif University of Technology Ebrahimi, Fatemeh (Author) ; Sameti, Hossein (Supervisor)

Abstract

Formality style transfer, in other words, automatic transfering style of informal text to formal and vice versa, means changing the style and form of a sentence without changing its content. With the increasing progress of deep neural networks, the formality style transfer in other languages has attracted the attention of other researchers and has made significant progress in natural language processing tasks. Due to the availability of parallel data in the English language, the task of style transfer has been approached and designed basically in the framework of the "encoder-decoder" architecture of neural networks. However, due to the lack of parallel datasets in the Persian language, this...

محتواي کتاب

Improving Pre-trained Language Model for Sentiment Analysis

, M.Sc. Thesis Sharif University of Technology Barikbin, Sadrodin (Author) ; Sameti, Hossein (Supervisor)

Abstract

Sentiment analysis is a useful problem which could serve a variety of fields from business intelligence to social studies and even health studies.Besides, solutions using Pre-Trained models have showed superiority over other ones. Hence we attempted to solve sentiment analysis by the help of pre-trained models. Using SemEval 2022 Task 10 formulation of sentiment analysis and proposing a new method, we improved the task baselines. Task baselines have used Dependency Graph Parsing and LSTM in their solutions respectively.Our solution outperformed the best one across all datasets and according to the Sentiment Graph F1 metric, defined in the task description, by at least 2 points. In our...

محتواي کتاب

Pronunciation Scoring in Computer-Assisted Language Learning

, M.Sc. Thesis Sharif University of Technology Mohammadi, Sajede (Author) ; Sameti, Hossein (Supervisor)

Abstract

Due to the increase in the number of people interested in learning new languages, in recent years, multiple systems have been developed to teach new languages to those who are interested. These systems are called Computer Assisted Language Learning (CALL). However, the most credible CALL systems, like Duolingo, do not support Persian. So the of this study is to design and implement one of the technical parts of CALL systems, the Computer Assisted Pronunciation Training(CAPT), which is the part responsible for evaluating the learners' input voice's pronunciation and generating appropriate score and feedback.In this study, good pronunciation means correct expression of words, correct...

محتواي کتاب

Text Summarization Using Deep Neural Networks

, M.Sc. Thesis Sharif University of Technology Sarkhani, Saeedeh (Author) ; Sameti, Hossein (Supervisor)

Abstract

In recent years, deep neural networks have achieved significant improvements in the field of automatic text summarization by using neural sequence architectures. However,the results of these improvements are more tangible in the production of short summaries (a few words or single sentences). In the field of producing long (multisentence) abstracts, the presented models suffer from several issues; These models produce the details of the events incorrectly and tend to generate the phrases been produced before repeatedly. The wording from the output of these models is very close to the original text. Also, the metrics used to evaluate the quality of produced summaries do not have the ability...

محتواي کتاب

Natural Language Generation from Meaning Representation Data

, Ph.D. Dissertation Sharif University of Technology Seifossadat, Elham (Author) ; Sameti, Hossein (Supervisor)

Abstract

Abstract: This thesis focuses on generating text from data. The Data-to-Text system must have three capabilities; First, it should be able to produce coherent, comprehensible, fluent text that is close to human natural language, in such a way that it is not possible to distinguish it from texts written by humans. Second, to be able to produce a variety of sentences to express the same concept. The third is to be able to express the information of the input data without repetition, redundancy, and omission in the output sentences. The latter is one of the main challenges of data-to-text systems because not being faithful to the input data can lead to se- rious problems in real-world...

محتواي کتاب

Natural Language Generation from Meaning Representation Data

, Ph.D. Dissertation Sharif University of Technology Seifossadat, Elham (Author) ; Sameti, Hossein (Supervisor)

Abstract

This thesis focuses on generating text from data. The Data-to-Text system must have three capabilities; First, it should be able to produce coherent, comprehensible, fluent text that is close to human natural language, in such a way that it is not possible to distinguish it from texts written by humans. Second, to be able to produce a variety of sentences to express the same concept. The third is to be able to express the information of the input data without repetition, redundancy, and omission in the output sentences. The latter is one of the main challenges of data-to-text systems because not being faithful to the input data can lead to se- rious problems in real-world applications. Until...

محتواي کتاب

Conversion of Persian Colloquial Texts into Official Texts using Unsupervised Learning Methods

, M.Sc. Thesis Sharif University of Technology Akhavan Azari, Karim (Author) ; Sameti, Hossein (Supervisor)

Abstract

Today, the production of colloquial texts in messengers, search engines, and question and answer systems has increased significantly, while text documents in other fields have a formal tone and style. Thus, there is a need for a system to convert these texts from colloquial form to the formal style. Attention to this need in non-Persian languages has also been recently and seriously felt, but almost at the time of writing, an efficient system has not been offered, and this issue requires more work in Persian than in languages such as English. In general, transferring texts from one form to another falls into the category of natural language processing applications and is called "style...

محتواي کتاب

Speech Enhancement Using Deep Neural Networks

, M.Sc. Thesis Sharif University of Technology Mohammadian Kalkhoran, Parisa (Author) ; Sameti, Hossein (Supervisor)

Abstract

Quality and intelligibility are two aspects of speech that are affected by various factors, such as background noise and echo. The performance of many commercial and military speech-based systems depends on at least one of these aspects of speech. Therefore, this research aims to design an improvement model to remove background noise and reverberation from the speech signal. The model training framework is based on deep learning methods and has a supervised approach in the time domain. The input of this system is the raw waveform of the speech signal mixed with noise and reverberation, and the output is the enhanced waveform of this signal. An architecture is proposed in this thesis based on...

محتواي کتاب

Context-based Persian Grapheme-to-Phoneme Conversion using Sequence-to-Sequence Models

, M.Sc. Thesis Sharif University of Technology Rahmati, Elnaz (Author) ; Sameti, Hossein (Supervisor)

Abstract

Many Text-to-Speech (TTS) systems, particularly in low-resource environments, struggle to produce natural and intelligible speech from grapheme sequences. One solution to this problem is to use Grapheme-to-Phoneme (G2P) conversion to increase the information in the input sequence and improve the TTS output. However, current G2P systems are not accurate or efficient enough for Persian texts due to the language’s complexity and the lack of short vowels in Persian grapheme sequences. In our study, we aimed to improve resources for the Persian language. To achieve this, we introduced two new G2P training datasets, one manually-labeled and the other machine-generated, containing over five million...

محتواي کتاب

Deep Learning for Speech Recognition

, M.Sc. Thesis Sharif University of Technology Azadi Yazdi, Saman (Author) ; Sameti, Hossein (Supervisor)

Abstract

Speech recognition is one of the first goals of speech processing. Our goal in this thesis is to use deep learning for speech recognition. In recent years little improvement of speech recognition accuracies are reported. Deep learning is a new learning algorithm that results in improvement in many machine learning tasks. Following improvements reported in speech recognition in English language by deep learning, in this thesis we tried to improve accuracy over common and new recognition methods for Persian language.
First the overall structure of a typical speech recognition system is introduced. For this purpose, the modules of a speech recognition system are introduced. Deep multilayer...

محتواي کتاب

Sequence-to-Sequence Voice Conversion Using Deep Learning

, M.Sc. Thesis Sharif University of Technology Shadbash, Hamed (Author) ; Sameti, Hossein (Supervisor)

Abstract

Apart from the content of the language that expresses the speaker's purpose and intent, human speech also contains other content, including other information such as the identity of the speaker, his or her gender and approximate age, the Intonation and mode of expression, the feeling of the speaker, the parts emphasized in the speech and so on. "Voice conversion" seeks to change the speaker-dependent content in an audio signal so that speaker-independent content (especially language content) remains unchanged. In other words, the purpose in voice conversion is to change the audio signal of speech created by one person in order to create the notion that the same speech was spoken by someone...

محتواي کتاب

Design of a Knowledge-Grounded Open Domain Dialogue System

, M.Sc. Thesis Sharif University of Technology Samiei Paghale, Mohammad Mahdi (Author) ; Sameti, Hossein (Supervisor)

Abstract

Despite significant advances in dialog systems, data-driven dialog systems are often unable to have content-driven conversations and present real-world knowledge in the context which is due to the lack of knowledge-based conversations in the research datasets and the lack of external knowledge in their architecture. As a result, they are far from the real world and opendomain use-cases. The goal of this research is to introduce a dialogue system based on external knowledge and facts using Deep Learning that the external knowledge can be updated and, the model will adapt itself and take them into account to have a rich conversation. It must be noted that external knowledge is assumed as a...

محتواي کتاب