Sharif Digital Repository / Sharif University of Technology / Search result

Using Partially-Observable Markov Decision Process for Dialogue Management in Spoken Dialogue Systems

, M.Sc. Thesis Sharif University of Technology Rahbar Noudehi, Siavash (Author) ; Sameti, Hossein (Supervisor)

Abstract

The use of Spoken Dialogue Systems is growing everyday and these systems will substitute current Iterative Voice Response systems in near future. A Spoken Dialogue System consists of Speech Recognition, Language Understanding, Dialogue Management, Speech Generation and Text to Speech Modules. Among these modules the only one that is specific part of Dialogue Systems is Dialogue Management. The responsibility of this part is to determine system behavior to maximize specific variables such as user goal finding accuracy and speed of finding the goal. There were different approaches to dialogue management in recent years the use of Partially-Observable Markov Decision Processes was very popular...

محتواي پايان نامه

Automatic Language Model Adaptation

, M.Sc. Thesis Sharif University of Technology Sharif Razavian, Ali (Author) ; Sameti, Hossein (Supervisor)

Abstract

Language Model plays a vital role in speech recognition systems. Restricting the search space, language models combine the result of words recognition phase with the assumptions gained from the rules and structure of words’ syntax (inflection and etc…), resulting into a more appropriate sequence of words as output. Since the language model for a specific language is not static and varies under the influence of different conditions, the problem of matching language model is being posed. In general, the aim of an adaptive language model is to offer a model capable of capturing all possible changes in the structure of words’ syntax, inflection and semantic structures. If the matching process...

محتواي پايان نامه

Spoken Language Understanding in Dialogue System

, M.Sc. Thesis Sharif University of Technology Bokaei, Mohammad Hadi (Author) ; Sameti, Hossein (Supervisor)

Abstract

In contrast to automatic speech recognition (ASR), which converts a speaker’s spoken utterance into a text string, spoken language understanding (SLU) is aimed at interpreting user’s intentions from their speech utterances. Traditionally, this has been accomplished by writing context-free grammars (CFGs) or unification grammars (UGs) manually. The manual grammar authoring process is laborious and expensive, requiring much expertise. In addition, robustness is a vital requirement of these modules, because the input of these modules comes from a speech recognition unit and always contains errors. In recent years, many data-driven models have been proposed for spoken language understanding, but...

محتواي پايان نامه

Learning Dialogue Management in Spoken Dialogue Systems

, M.Sc. Thesis Sharif University of Technology Habibi, Maryam (Author) ; Sameti, Hossein (Supervisor)

Abstract

Applying spoken dialogue systems (SDS's) is growing in the real life more rapidly because of the advances in the design and management of these systems. The traditional touch tone computer telephony systems are being substituted by the SDS's. In a typical SDS, the user speaks naturally to the system through a phone line and the system provides the required information or performs the required action. Banking and ticket reservation are typical examples of the prevalent SDS's. A spoken dialogue system has four units: automatic speech recognition (ASR), natural language understanding (NLU), dialogue management (DM), and spoken language generation (SLG). In this work, the first spoken dialogue...

محتواي پايان نامه

Detecting Speakers in a Telephone Conversation

, M.Sc. Thesis Sharif University of Technology Soltani Farani, Ali (Author) ; Sameti, Hossein (Supervisor)

Abstract

The human speech signal conveys many levels of information ranging from phonetic content to speaker identity and even emotional status. This thesis deals with the task of open-set speaker identification (SI) from an unconstrained telephone conversation between two speakers. The goal is to find at most two speakers among a known set of target speakers that best match the voice samples of the input speech; the input voice samples are not constrained to the target speaker set. The uni-speaker problem is investigated first. The classic GMM-UBM system for text-independent SI and its adapted form are explored. The use of score-space information is advocated as a complementary source to the...

محتواي پايان نامه

Persian Statistical Natural Language Understanding Based on Partially Annotated Corpus

, M.Sc. Thesis Sharif University of Technology Jabbari, Fattaneh (Author) ; Sameti, Hossein (Supervisor)

Abstract

Spoken language understanding unit is one of the most important parts of a spoken dialogue system. The input of this system is the output of speech recognition unit. The main function of this unit is to extract the semantic information from the input utterances. There are two main types of approaches to do this task: rule-based approaches, and data-driven approaches. Today data-driven approaches are of more interest because they are more flexible and robust compared to the rule-based approaches. The main drawback of these methods is that they need a large amount of fully annotated or in some cases Treebank data. Preparing such data is time consuming and expensive. The goal of this thesis is...

محتواي پايان نامه

Semantic Clustering of Persian Verbs

, M.Sc. Thesis Sharif University of Technology Aminian, Maryam (Author) ; Sameti, Hossein (Supervisor)

Abstract

Semantic classification of words based on unsupervised learning methods is a challenging issue in computational lexical semantics. The goal of this field of study is to recognize the words that are in the same semantic classes; i.e., can have the same set of arguments. Among all word categories, verb is known as one the most important and is assumed as the central part of the sentence in certain linguistic theories such as case grammar and dependency grammar. Based on Levin’s idea, diathesis alternations and the similarity between these alternations are the clues for the semantic classification of verbs. This idea is verified in languages such as English and German with promising results....

محتواي پايان نامه

Speaker Adaptation in HMM-Based Persian Speech Synthesis

, M.Sc. Thesis Sharif University of Technology Bahmaninezhad, Fahimeh (Author) ; Sameti, Hossein (Supervisor)

Abstract

Text-to-speech synthesis, one of the key technologies in speech processing, is a technique for generating speech signal from arbitrarily given text with target speaker’s voice characteristics and various speaking styles and emotional expressions. Statistical parametric speech synthesishasrecently been shown to be very effective in generating acceptable synthesized speech. Therefore, in this study,the main focus is on one of the instances of these techniquescalled hidden Markov model-based speech synthesis. In text-to-speech systems, it is desirable to synthesize high quality speech using a small amount of speech data; this goal would be achieved by employing speaker adaptation framework and...

محتواي پايان نامه

High-Performance Keyword Spotting System for Persian Language

, M.Sc. Thesis Sharif University of Technology Ghorbani, Shahram (Author) ; Sameti, Hossein (Supervisor)

Abstract

Keyword spotting with high speed and accuracy is an important subject whithin speech processing domain especially when we are dealing with various transmission channels. In this research discriminative keyword spotting methods are compared with HMM-based approaches. We have employed the discriminative approaches as our baseline methods due to their higher accuracy. The drawback of the conventional discriminative methods is their high computation cost and long execution time. The discriminative approach consists of two steps: feature extraction and classification. We have proposed four ideas to improve the performance of the baseline method. To improve the speed of the process, in feature...

محتواي کتاب

Phonetics of Persian Intonation

, M.Sc. Thesis Sharif University of Technology Hosseinnejad, Shadi (Author) ; Eslami, Moharram (Supervisor) ; Sameti, Hossein (Co-Advisor)

Abstract

This study is a research on Persian Intonational System, which was carried out within the Autosegmental-Metrical framework.The intonational elements of Persian are represented by two distinctive levels (High and Low). Persian intonation system enjoys three main elements: pitch accents, Phrase accents and boundary tones. Every intonational element has its own meaning. The data of study is about 200 utterances produced by two Persian native speaker one male and one female. These utterances have been annotated in four levels in PToBI: phoneme level, word level, tone level and break index level. In this study we aimed to formulate the acoustic representation of the intonational elements by three...

محتواي کتاب

Markov Logic Networks for Persian Spoken Language Understanding

, M.Sc. Thesis Sharif University of Technology Hemmatan Attarbashi, Ensieh (Author) ; Bahrani, Mohammad (Supervisor) ; Khosravizadeh, Parvaneh (Co-Advisor) ; Sameti, Hossein (Co-Advisor)

Abstract

Spoken Language Understanding (SLU) is aimed at extracting meaning from natural spoken language. Meaning extraction ranges from "extracting specific phrases" to "extracting users' intentions from their speech" and goes as far as "extracting the entities and details of their intentions". Extracting the exact intended meaning of the user is a sophisticated process. In this research, considering the lack of standard data in Persian, an SLU system for this language has been implemented using Markov Logic Networks (MLNs), in order to reduce the need for extra datasets. MLNs combine the explanatory power and orderliness of First-Order Logic with the uncertainty of probabilities. Therefore, these...

محتواي کتاب

Music Track Detection Using Audio Fingerprinting

, M.Sc. Thesis Sharif University of Technology Yazdanian, Saeed (Author) ; Sameti, Hossein (Supervisor)

Abstract

Music information reterival systems have a lot of applications in music filtering and broadcast monitoring due to the huge amount of multimedia data these days. In these systems the feature extraction method is called audio fingerprinting. Small size of fingerprints allows the systems to search efficiently in thousands or millions numbers of audio songs. The input signal is usually just a couple of seconds long and degraded in several ways. The goal is to design a system which is robust to signal degradations and efficient to search. In this thesis one of the basic systems is reviewed and improved in several ways. This system uses spectrogram of signals to extract features and build an...

محتواي کتاب

A Hybrid Approach for Normalization of Non-Standard Persian Texts

, M.Sc. Thesis Sharif University of Technology Rostami, Ramtin (Author) ; Sameti, Hossein (Supervisor) ; Ghasem-Sani, Gholamreza (Co-Advisor)

Abstract

With the increase of internet usage and the volume of available data, the need for data mining and text processing is felt. One of the common obstacles for using these methods is usage of colloquial and non-standard language in writings. Due to this fact, combined with the fact that NLP tasks in Persian language had always faced data shortage issues, in this thesis, we first collect and construct a parallel data set, consisting of colloquial texts used in social media. Then after examining various methods used in other languages for text normalization, we propose a combination of new hybrid methods, involving Statistical Machine Translation methodology with some modification, to normalize...

محتواي کتاب

Phonetic Representation of Pitch Accent in Persian Words

, M.Sc. Thesis Sharif University of Technology Jafari Tazejani, Somaye (Author) ; Eslami, Moharram (Supervisor) ; Sameti, Hossein (Co-Advisor)

Abstract

The stress positions of the words are determined according to the type of their morphological elements. Persian Words often have a fixed position for stress. However, Persian wordforms show different stress positions, based on their morpheme types or in the other words, their bound-non derivational affixes. Inflectional affixes accept stress, whereas clitics do not. In the present research we studied both types of non derivational affixes considering their phonetic features meaning fundamental frequency and duration. The phonetic representation of the pitch accent as the phonemic- intonational element was given as a result of this study, as well. The differences in the phonetic...

محتواي کتاب

Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science (M.Sc.) in Computer Engineering, Artificial Intelligence

, M.Sc. Thesis Sharif University of Technology Hosseini, Mohammad Saleh (Author) ; Sameti, Hossein (Supervisor)

Abstract

Punctuation marks in every language, constitute an important part of a text. Not inserting these punctuations in text, makes the text ambiguous. The output text of automatic speech recognition (ASR) system, is typically a raw sequence of words, containing no punctuation marks. This makes the text difficult or even impossible to make sense of for humans, as well as for any further text processing tasks. The goal of this thesis is to perform automatic punctuation insertion in Persian texts lacking punctuation marks. To the best of our knowledge, this is the first work done in this context for the Persian language. For this purpose, firstly, we assembled a state-of-the-art corpus to train and...

محتواي کتاب

Pattern Based Relation Extraction on Presian News Articles

, M.Sc. Thesis Sharif University of Technology Cholmaghani Qaheh, Ali (Author) ; Bahrani, Mohammad (Supervisor) ; Sameti, Hossein (Co-Advisor)

Abstract

Relation extraction is known as a main task in information extraction. There are two main approach in this field, rule based and statistical approaches. This thesis applied a rule based relation extraction approach. In this research we tried to recognize Persian syntactic and morphological patterns to extract relation between named entities. At first we annotated a news dataset by person,organization and location named entity tags which is included more than 100 thousand tokens. After that we found there are 1037 relations 2197 candidate relations. Candidate and labled relations extracted between two entities which is located in a clause. These relations are "PERS_PERS-COMMENTING",...

محتواي کتاب

Unsupervised Persian Keyword Extraction Using Exemplar Terms

, M.Sc. Thesis Sharif University of Technology Alidoust, Ali (Author) ; Sameti, Hossein (Supervisor) ; Ghasem Sani, Gholam Reza (Co-Advisor)

Abstract

Keywords or keyphrases are of importance as the smallest unit of representing the meaning of a text. Automated Keyword Extraction (AKE), as one of the natural language processing tasks is used in various applications such as searching, indexing and information retrieval. Keywords of scientific articles are basically specified manually by their authors, whereas most of the information available on the internet lack such keywords. In this research, we endeavor to automatically extract keywords of a set of Persian paper abstracts using an unsupervised machine learning method. The method used is to extract a set of candidate phrases from the text, and to cluster the document words to find a set...

محتواي کتاب

Feature Extraction for Spoofing Detection in Automatic Speaker Verification Systems

, M.Sc. Thesis Sharif University of Technology Adiban, Mohammad (Author) ; Sameti, Hossein (Supervisor)

Abstract

Automatic speaker verification systems are becoming increasingly epidemic by the development of technology. The use of these systems in applications such as smart phones is becoming increasingly popular as a user password, and the security of these systems against spoofing attacks has always been a concern. In this research, the purpose is to comprehensively investigate all types of possible spoofing attacks on speaker verification systems and then provide a countermeasure system against such attacks. The countermeasure system presented in this report, extracts features from speech signals that can make the difference between the genuine and spoofed speech spectrum more prominent by...

محتواي کتاب

Grapheme to Phoneme Conversion using Deep Neural Networks

, M.Sc. Thesis Sharif University of Technology Safari, Arash (Author) ; Sameti, Hossein (Supervisor)

Abstract

The goal of this research is to convert letter to phoneme using deep neural networks. As the deep neural networks are among the best methods for speech and text processing (The highest accuracy in converting text to letter in English language is obtained by using deep neural networks too.), multilayer deep neural networks are used in this research to increase the accuracy. It should be noted that deep neural networks have not been used for converting text to phoneme in Persian language before. In this research a rule based alignment method based on our preposed rule is presented and achieved an accuracy more than 98%. Several approaches for converting word to grapheme with emphasis on the...

محتواي کتاب

Music Emotion Recognition

, M.Sc. Thesis Sharif University of Technology Pouyanfar, Samira (Author) ; Sameti, Hossein (Supervisor)

Abstract

Measuring emotions of music is one of the methods to determine music content. Music emotion detection is applicable in music retrieval, recognition of music genre and also music data management softwares. Music emotion is considered in different sciences such as physiology, psychology, musicology and engineering. First, we collected a database of different types of music with various emotions. These data have been labeled according to their emotions. In this project, four emotions (Angry, happy, relax and sad) have been used as labels based on Thayer’s two dimension emotion model. There are two basic steps for music emotion recognition similar to other recognition systems: Feature extraction...

محتواي پايان نامه