Sharif Digital Repository / Sharif University of Technology / Search result

Modeling Persian Language in the Framework of Complex Networks

, M.Sc. Thesis Sharif University of Technology Sabooni Aghdam, Amir Mahdi (Author) ; Bahrani, Mohammad (Supervisor)

Abstract

The interest in analyzing human language with complex networks is on the rise in recent years and a considerable body of research in this area has already been accumulated.However unfortunately, the use of applications of complex networks in Persian Linguistics research is missing. With the goal of introducing complex networks and their applications in this field, two of these applications have been studied in this research. First, we tried to build an inclusive network model, considering two levels of Syntax and Word Cooccurrence, for the Persian Language and provide Linguistics interpretations for them. In addition, by comparing co-occurrence networks of different languages, garnered from...

محتواي کتاب

Persian Abstractive Summarization using Graph-based Abstract Meaning Representation

, M.Sc. Thesis Sharif University of Technology Haddadan, Shohreh (Author) ; Bahrani, Mohammad (Supervisor)

Abstract

This study attempts to introduce a novel approach to abstractive summarization in Persian. According to the methodology the first step is to represent input text sentences into an abstract meaning representation structure. This representation is syntax free thus, it helps the summarization system to represent sentences more semantic based and free of the sentence syntactic structure. In order to select suitable content for the summary output semantic and structural features are extracted from the representation. Data used in this research consists of approximatelty 200 senctences summarized in 30 sentences of a famous story book named: ”The little prince”. An SVM is trained on 80% of...

محتواي کتاب

Question Processing for Open Domain Persian Question Answering Systems

, M.Sc. Thesis Sharif University of Technology Hosseini, Hawre (Author) ; Bahrani, Mohammad (Supervisor)

Abstract

Question answering systems are systems which get a question in natural language as input and present an explicit, appropriate answer to the question. One of the major components of automatic question answering systems is question processing component in which the input question is analyzed. The main goal of question processing phase is to determine the answer type through question classification. Rule-based, machine learning-based and hybrid approaches have been used in order to develop question classifiers among which machine learning-based ones have outperformed the others. This study’s main goal is to develop a question classifier for Persian open domain question answering systems....

محتواي کتاب

Pronoun Resolution with Data Driven Approaches

, M.Sc. Thesis Sharif University of Technology Nourbakhsh, Aria (Author) ; Bahrani, Mohammad (Supervisor)

Abstract

Pronoun resolution is one of the challenges of natural language processing. The proposed solutions range from heuristic rule-based to machine learning data driven approaches. In this thesis, we followed a previous machine learning base work to Persian pronoun anaphora resolution. The primary goal of this thesis was to improve results, mainly by extracting more balanced data and to add more features to the extracted feature vectors used in classification. Using PCAC2008 dataset, we considered noun phrase structure as a way to extract more suitable training data. Features added to the extracted data include syntactic and semantic features. Then, we trained and tested different machine learning...

محتواي کتاب

Rule-Based Conversion of Colloquial Texts into Official Texts in Persian

, M.Sc. Thesis Sharif University of Technology Rajabpur, Mohammad (Author) ; Bahrani, Mohammad (Supervisor)

Abstract

In this study, first a set of data was colleted which consisted of colloquial sentences in Persian. Each of these sentences was rendered into standard Persian by native speakers. As a result, a corpus of parallel data including 1698 pairs of sentences was created. Then each colloquial sentence and its formal equivalent were converted into term-frequency vectors and the cosine distance similarity between the two vectors was calculated. Besides the mean and the standard deviation of all cosine distances were obtained. Afterwards the whole set of data was divided into two halves through Stratified randomization so that the two halves resembled each other in terms of cosine distance similarity....

محتواي کتاب

Automatic Author Age Identification Using Social Media Texts

, M.Sc. Thesis Sharif University of Technology Askari, Maryam (Author) ; Bahrani, Mohammad (Supervisor)

Abstract

The most common form of communication on the internet and social network websites is text messages. normally communication on social media or even on the web is by posting some sort of text. usually, these messages or posts are short and text used in them may not follow any language standards, this makes it very difficult to process them. Different age groups use a certain language differently and this is shown in the way, each of them writes texts. The advancements made in the field of natural language processing and computational linguistics makes it possible to predict, text authors age groups by analyzing the way they write. This study focuses on ways to automatically recognize the age...

محتواي کتاب

Computational Textual Criticism of Manuscripts' Texts

, M.Sc. Thesis Sharif University of Technology Ranjbar Chaghakabudi, Vahid (Author) ; Bahrani, Mohammad (Supervisor)

Abstract

In this thesis, I try to use methods and algorithms of computational linguistics and natural language processing for textual criticism of Persian manuscripts' texts and design and develop a software based on that. Suggested method in this thesis is comparing all manuscripts' texts with base manuscript's text by Dynamic Time Warping (DTW) algorithm, after definition of base manuscript. Then the same sentences are extracted in different manuscripts and POS tagged as body of corpus of author's style and learned as language model by Hidden Markov Model (HMM). At next stage, due to the textual criticism's rules and using of author stylistics algorithms choose the case which is most similar to the...

محتواي کتاب

Media Bias Analysis for Persian Text News

, M.Sc. Thesis Sharif University of Technology Abbaszadeh Hojedki, Mohaddese (Author) ; Bahrani, Mohammad (Supervisor)

Abstract

There are different types of media bias. The aim of this study is to analyze media bias by considering two types of it: selection (or coverage) bias and language bias. Thus we have collected some specific news stories or articles -which contain “Iran” as a keyword- from the websites of four news broadcasters that are Al Arabiya, Deutsche Welle (DW), Radio France Internationale (RFI) and SPUTNIK, to build text datasets. For the purpose of comparing and analyzing media bias, the news had to be gathered during two time frames before and after the day the P5+1, European Union and Iran reached Joint Comprehensive Plan of Action (JCPOA). Finally, the collected corpora have amounted to 784 news...

محتواي کتاب

Automatic Labeling of Prosody in Persian Unmarked Speech

, M.Sc. Thesis Sharif University of Technology Jamshidlou, Paria (Author) ; Eslami, Moharram (Supervisor) ; Bahrani, Mohammad (Co-Advisor)

Abstract

Prosodic annotations are used for locating and characterizing prominent parts in utterances as well as identifying and describing boundaries of coherent stretches of speech. Automatic detection and labeling of prosodic events in speech has received much attention in recent years since prosody is intricately bound to the semantics of the utterance. Recognition of prosodic events is important for spoken language applications such as automatic understanding and translation of speech. Moreover, corpora labeled with prosodic markers are essential for building speech synthesizers that use data-driven approaches to generate natural speech. Such databases are important to reach a better...

محتواي کتاب

Utilizing Latent Topic Models for Persian Document Classification and Providing Appropriate Solutions to Improve It

, M.Sc. Thesis Sharif University of Technology Khaki Ardekani, Basira (Author) ; Bahrani, Mohammad (Supervisor) ; Vazirnezhad, Bahram (Co-Advisor)

Abstract

Text classification accompanied by high precision has become a challenging issue in computational linguistics and natural language processing science. Proper data set accessibility, utilizing the best method and prominent linguistics features has been always regarded as the basic concern of this process. The following study relying on Bijan Khan Corpus is tried to represent keywords vectors of different documents using tf_idf. These vectors are regarded as an input for latent topic models algorithms including probabilistic latent semantic analysis. The output of this algorithm will be the documents feature vectors which will be later used in order to train different classifiers like K...

محتواي کتاب

Introducing an Approach to Build a Phrase Structure Treebank Using Persian Dependency Treebank

, M.Sc. Thesis Sharif University of Technology Soltanzadeh, Fatemeh (Author) ; Bahrani, Mohammad (Supervisor) ; Eslami, Moharram (Co-Advisor)

Abstract

Treebanks are useful in many applications of natural language processing such as machine translation, speech recognition, information extraction, and etc. They are also being used in theoretical linguistics to study languages. For instance, they are valuable for development of different syntactic theories, calculation of frequency of syntactic rules, evaluation and comparison of statistical models and etc.
The most treebanks are based on phrase structure grammar or dependency grammar. In a phrase structure treebank, a sentence is divided to phrases. In this representation, a phrase is composed of several words. But in the dependency treebank, connection between two words is based on the...

محتواي کتاب

Natural Language Generation from Visual Input

, M.Sc. Thesis Sharif University of Technology Rohanian, Mojtaba (Author) ; Vazirnezhad, Bahram (Supervisor) ; Bahrani, Mohammad (Co-Advisor)

Abstract

Natural language generation is one of the burgeoning areas of Nautural Langauge Processing/Computational Linguistics in which the primary concern is to automatically generate sentences in human languages. Based on the type of the input they receive, NLG systems can be divided into two categories: text-to-text and data-to-text. Systems of the first type are usually a part of machine translation systems. The latter type, which is the subject of the current research, deals with language generation based on inputs other than raw text, such as databases, images, videos, audio and so on. This project is an attempt to create an image-to-text system. This is the first research on data-to-text NLG...

محتواي کتاب

Phonetic Representation of Intonation in Persian Vocative Structures

, M.Sc. Thesis Sharif University of Technology Bahmanian, Nasimeh (Author) ; Eslami, Moharram (Supervisor) ; Bahrani, Mohammad (Co-Advisor)

Abstract

This study investigates the intonational properties of vocative structure in standard Persian in the framework of Auto-segmental-metrical phonology. Pitch accent is one of the elements whithin this framework, that refers to the prominence in the utterance. In Persian all lexical units are stressed in their final syllable and there is a language-specific feature in Persian according to which pitch accents are aligned with lexically stressed syllables. Contrary to the said feature, previous studies claim that in vocative structure, it is the first syllable that is accented, due to the difference existed between intonation pattern of vocative structure compared to its citation form. The data...

محتواي کتاب

Pattern Based Relation Extraction on Presian News Articles

, M.Sc. Thesis Sharif University of Technology Cholmaghani Qaheh, Ali (Author) ; Bahrani, Mohammad (Supervisor) ; Sameti, Hossein (Co-Advisor)

Abstract

Relation extraction is known as a main task in information extraction. There are two main approach in this field, rule based and statistical approaches. This thesis applied a rule based relation extraction approach. In this research we tried to recognize Persian syntactic and morphological patterns to extract relation between named entities. At first we annotated a news dataset by person,organization and location named entity tags which is included more than 100 thousand tokens. After that we found there are 1037 relations 2197 candidate relations. Candidate and labled relations extracted between two entities which is located in a clause. These relations are "PERS_PERS-COMMENTING",...

محتواي کتاب

Ezafe Recognition Using Dependency Parsing

, M.Sc. Thesis Sharif University of Technology Nassajian, Minoo (Author) ; Bahrani, Mohammad (Supervisor) ; Shojaei, Razieh (Co-Supervisor)

Abstract

Ezafe is regarded as one of the most controversial and challenging issues in different Persian Language Processing (NLP) fields. It is recognized and pronounced but usually not written. So, this results in a high degree of ambiguity in Persian texts. Dependency grammar plays a significant role in optimization problems. So, to recognize the position of Ezafe in a sentence, this grammar is used in this current study. This method helps speed up computer operations and use low memory. Within this framework, first we take a close look at Ezafe distribution in Persian text. We use Uppsala Persian Dependency Corpus (2015) to analyze parsed sentences. The Ezafe constructions under study include...

محتواي کتاب

Recurrent Neural Network Language Modeling For Persian

, M.Sc. Thesis Sharif University of Technology Hosseini Saravani, Habib (Author) ; Bahrani, Mohammad (Supervisor) ; Veisi, Hadi (Supervisor)

Abstract

Neural Networks have been applied to Language Modeling to solve a major problem that N-gram language models could not overcome: discreteness of the words. Generally, neural networks were successful in solving this problem and improved Language Modeling by reducing the perplexity of the models. Neural networks can find grammatical and semantic connections among the words using word embedding which maps each word to a low dimensional feature vector of real numbers. In this research, different kinds of neural network applied to Language Modeling has been reviewed. Also, it has been tried to reduce the perplexity of Persian language models on a 100-million scale data set using a single-layer...

محتواي کتاب

Generating Text from Abstract Meaning Representation in Persian

, M.Sc. Thesis Sharif University of Technology Kakaei, Farokh (Author) ; Rahimi, Saeed (Supervisor) ; Bahrani, Mohammad (Supervisor)

Abstract

This research mainly aims to propose, for the first time, a way of generating text from Abstract Meaning Representation (AMR) in Persian. AMR is a rather new way of representing the meaning of natural language sentences, that captures the various semantic components in a rooted, directed, acyclic graph. Generating text from AMR is a challenging task in natural language processing as some syntactic constructs are abstracted away from the representation, resulting in one single AMR having multiple translations. Considering many applications of generating text from meaning representations in natural language processing it seems inevitable to design some methods for converting such...

محتواي کتاب

Automatic Evaluation of Machine Translation Using Abstract Meaning Representation

, M.Sc. Thesis Sharif University of Technology Sadeghieh, Hamid (Author) ; Rezae, Saeed (Supervisor) ; Bahrani, Mohammad (Supervisor)

Abstract

Machine Translation Quality Evaluation, compared to the other issues dealt with in the field of Natural Language Processing, is faced with the challenge that the repetition of the translation process from the same linguistic form in the source language will not necessarily lead to a unique linguistic form in the target language. Therefore, considering the fact that the Abstract Meaning Representation (AMR) graph is the same for all the sentences of similar meaning, this thesis has been an attempt to extend the efficiency of AMR graphs to the area of Machine Translation Quality Evaluation. The main research question dealt with in the present thesis was whether the similarity of the AMR graphs...

محتواي کتاب

An Automatic Semantic Tagger Based on USAS for the Persian Language

, M.Sc. Thesis Sharif University of Technology Nayeri, Negar (Author) ; Rahimi, Saeed (Supervisor) ; Bahrani, Mohammad (Supervisor)

Abstract

The emergence of lexical knowledge bases such as WordNet and FarsNet foregrounded the importance of semantic annotation of words in the areas of natural language processing and corpus linguistics. The methodology in these knowledge bases is based on semantic relations and dictionary definitions of the words in coverage. Another efficient way to perform semantic annotation is by semantically classifying the lexicon of a language in a taxonomy. In this research, we build a semantic annotation system for the semantic tagging of Persian texts. This system can be used for building tools and softwares for natural language processing in applications such as text summarization, plagiarism detection...

محتواي کتاب

Automatic Recognition of Quranic Maqams Using Machine Learning

, M.Sc. Thesis Sharif University of Technology Khodabandeh, Mohammad Javad (Author) ; Sameti, Hossein (Supervisor) ; Bahrani, Mohammad (Supervisor)

Abstract

Automatic recognition of musical Maqams has been one of the challenging problems in Music Information Retrieval. Despite the increasing amount of related research in recent years, we are still far away from building related real-life applications. Nevertheless, a very small portion of these research is dedicated to automatic recognition of Maqams in recitation of the Holy Quran. In this thesis, as a first attempt, we have used machine learning methods to classify six Maqam families which are commonly used in Quran recitation. Also, due to the lack of pre-exisiting datasets, we have annotated approximately 1325 minutes of Tadwir recitation from two prominent Egyptian reciters, i.e., Muhammad...

محتواي کتاب