Sharif Digital Repository / Sharif University of Technology / Search result

Persian Abstractive Summarization using Graph-based Abstract Meaning Representation

, M.Sc. Thesis Sharif University of Technology Haddadan, Shohreh (Author) ; Bahrani, Mohammad (Supervisor)

Abstract

This study attempts to introduce a novel approach to abstractive summarization in Persian. According to the methodology the first step is to represent input text sentences into an abstract meaning representation structure. This representation is syntax free thus, it helps the summarization system to represent sentences more semantic based and free of the sentence syntactic structure. In order to select suitable content for the summary output semantic and structural features are extracted from the representation. Data used in this research consists of approximatelty 200 senctences summarized in 30 sentences of a famous story book named: ”The little prince”. An SVM is trained on 80% of...

Pronoun Resolution with Data Driven Approaches

, M.Sc. Thesis Sharif University of Technology Nourbakhsh, Aria (Author) ; Bahrani, Mohammad (Supervisor)

Abstract

Pronoun resolution is one of the challenges of natural language processing. The proposed solutions range from heuristic rule-based to machine learning data driven approaches. In this thesis, we followed a previous machine learning base work to Persian pronoun anaphora resolution. The primary goal of this thesis was to improve results, mainly by extracting more balanced data and to add more features to the extracted feature vectors used in classification. Using PCAC2008 dataset, we considered noun phrase structure as a way to extract more suitable training data. Features added to the extracted data include syntactic and semantic features. Then, we trained and tested different machine learning...

Question Processing for Open Domain Persian Question Answering Systems

, M.Sc. Thesis Sharif University of Technology Hosseini, Hawre (Author) ; Bahrani, Mohammad (Supervisor)

Abstract

Question answering systems are systems which get a question in natural language as input and present an explicit, appropriate answer to the question. One of the major components of automatic question answering systems is question processing component in which the input question is analyzed. The main goal of question processing phase is to determine the answer type through question classification. Rule-based, machine learning-based and hybrid approaches have been used in order to develop question classifiers among which machine learning-based ones have outperformed the others. This study’s main goal is to develop a question classifier for Persian open domain question answering systems....

Rule-Based Conversion of Colloquial Texts into Official Texts in Persian

, M.Sc. Thesis Sharif University of Technology Rajabpur, Mohammad (Author) ; Bahrani, Mohammad (Supervisor)

Abstract

In this study, first a set of data was colleted which consisted of colloquial sentences in Persian. Each of these sentences was rendered into standard Persian by native speakers. As a result, a corpus of parallel data including 1698 pairs of sentences was created. Then each colloquial sentence and its formal equivalent were converted into term-frequency vectors and the cosine distance similarity between the two vectors was calculated. Besides the mean and the standard deviation of all cosine distances were obtained. Afterwards the whole set of data was divided into two halves through Stratified randomization so that the two halves resembled each other in terms of cosine distance similarity....

Automatic Author Age Identification Using Social Media Texts

, M.Sc. Thesis Sharif University of Technology Askari, Maryam (Author) ; Bahrani, Mohammad (Supervisor)

Abstract

The most common form of communication on the internet and social network websites is text messages. normally communication on social media or even on the web is by posting some sort of text. usually, these messages or posts are short and text used in them may not follow any language standards, this makes it very difficult to process them. Different age groups use a certain language differently and this is shown in the way, each of them writes texts. The advancements made in the field of natural language processing and computational linguistics makes it possible to predict, text authors age groups by analyzing the way they write. This study focuses on ways to automatically recognize the age...

Media Bias Analysis for Persian Text News

, M.Sc. Thesis Sharif University of Technology Abbaszadeh Hojedki, Mohaddese (Author) ; Bahrani, Mohammad (Supervisor)

Abstract

There are different types of media bias. The aim of this study is to analyze media bias by considering two types of it: selection (or coverage) bias and language bias. Thus we have collected some specific news stories or articles -which contain “Iran” as a keyword- from the websites of four news broadcasters that are Al Arabiya, Deutsche Welle (DW), Radio France Internationale (RFI) and SPUTNIK, to build text datasets. For the purpose of comparing and analyzing media bias, the news had to be gathered during two time frames before and after the day the P5+1, European Union and Iran reached Joint Comprehensive Plan of Action (JCPOA). Finally, the collected corpora have amounted to 784 news...

Computational Textual Criticism of Manuscripts' Texts

, M.Sc. Thesis Sharif University of Technology Ranjbar Chaghakabudi, Vahid (Author) ; Bahrani, Mohammad (Supervisor)

Abstract

In this thesis, I try to use methods and algorithms of computational linguistics and natural language processing for textual criticism of Persian manuscripts' texts and design and develop a software based on that. Suggested method in this thesis is comparing all manuscripts' texts with base manuscript's text by Dynamic Time Warping (DTW) algorithm, after definition of base manuscript. Then the same sentences are extracted in different manuscripts and POS tagged as body of corpus of author's style and learned as language model by Hidden Markov Model (HMM). At next stage, due to the textual criticism's rules and using of author stylistics algorithms choose the case which is most similar to the...

Automatic Recognition of Quranic Maqams Using Machine Learning

, M.Sc. Thesis Sharif University of Technology Khodabandeh, Mohammad Javad (Author) ; Sameti, Hossein (Supervisor) ; Bahrani, Mohammad (Supervisor)

Abstract

Automatic recognition of musical Maqams has been one of the challenging problems in Music Information Retrieval. Despite the increasing amount of related research in recent years, we are still far away from building related real-life applications. Nevertheless, a very small portion of these research is dedicated to automatic recognition of Maqams in recitation of the Holy Quran. In this thesis, as a first attempt, we have used machine learning methods to classify six Maqam families which are commonly used in Quran recitation. Also, due to the lack of pre-exisiting datasets, we have annotated approximately 1325 minutes of Tadwir recitation from two prominent Egyptian reciters, i.e., Muhammad...

Modeling Persian Language in the Framework of Complex Networks

, M.Sc. Thesis Sharif University of Technology Sabooni Aghdam, Amir Mahdi (Author) ; Bahrani, Mohammad (Supervisor)

Abstract

The interest in analyzing human language with complex networks is on the rise in recent years and a considerable body of research in this area has already been accumulated.However unfortunately, the use of applications of complex networks in Persian Linguistics research is missing. With the goal of introducing complex networks and their applications in this field, two of these applications have been studied in this research. First, we tried to build an inclusive network model, considering two levels of Syntax and Word Cooccurrence, for the Persian Language and provide Linguistics interpretations for them. In addition, by comparing co-occurrence networks of different languages, garnered from...

Automatic Labeling of Prosody in Persian Unmarked Speech

, M.Sc. Thesis Sharif University of Technology Jamshidlou, Paria (Author) ; Eslami, Moharram (Supervisor) ; Bahrani, Mohammad (Co-Advisor)

Abstract

Prosodic annotations are used for locating and characterizing prominent parts in utterances as well as identifying and describing boundaries of coherent stretches of speech. Automatic detection and labeling of prosodic events in speech has received much attention in recent years since prosody is intricately bound to the semantics of the utterance. Recognition of prosodic events is important for spoken language applications such as automatic understanding and translation of speech. Moreover, corpora labeled with prosodic markers are essential for building speech synthesizers that use data-driven approaches to generate natural speech. Such databases are important to reach a better...

Phonetic Representation of Intonation in Persian Vocative Structures

, M.Sc. Thesis Sharif University of Technology Bahmanian, Nasimeh (Author) ; Eslami, Moharram (Supervisor) ; Bahrani, Mohammad (Co-Advisor)

Abstract

This study investigates the intonational properties of vocative structure in standard Persian in the framework of Auto-segmental-metrical phonology. Pitch accent is one of the elements whithin this framework, that refers to the prominence in the utterance. In Persian all lexical units are stressed in their final syllable and there is a language-specific feature in Persian according to which pitch accents are aligned with lexically stressed syllables. Contrary to the said feature, previous studies claim that in vocative structure, it is the first syllable that is accented, due to the difference existed between intonation pattern of vocative structure compared to its citation form. The data...

Natural Language Generation from Visual Input

, M.Sc. Thesis Sharif University of Technology Rohanian, Mojtaba (Author) ; Vazirnezhad, Bahram (Supervisor) ; Bahrani, Mohammad (Co-Advisor)

Abstract

Natural language generation is one of the burgeoning areas of Nautural Langauge Processing/Computational Linguistics in which the primary concern is to automatically generate sentences in human languages. Based on the type of the input they receive, NLG systems can be divided into two categories: text-to-text and data-to-text. Systems of the first type are usually a part of machine translation systems. The latter type, which is the subject of the current research, deals with language generation based on inputs other than raw text, such as databases, images, videos, audio and so on. This project is an attempt to create an image-to-text system. This is the first research on data-to-text NLG...

Design and Preparation of a Persian Semantic Corpus Using Abstract Meaning Representation

, M.Sc. Thesis Sharif University of Technology Takhshid, Reza (Author) ; Bahrani, Mohammad (Supervisor) ; Shojaie, Razieh (Supervisor)

Abstract

To keep in line with the day to day advancements in the fields of computational linguistics and natural language processing, and the growing attention of researchers to semantic processing, this thesis presents the design and preparation of a Persian semantic corpus using Abstract Meaning Representation (AMR). This semantic representation pairs each sentence with a single rooted, acyclic, directed graph, which is human and computer readable. Moreover, this representation paves the way for the creation of large semantic corpora. In order to bring such benefits to Persian, in this thesis we present solutions for representing Persian sentences in the framework of AMR. Moreover, a corpus of 150...

Generating Text from Abstract Meaning Representation in Persian

, M.Sc. Thesis Sharif University of Technology Kakaei, Farokh (Author) ; Rahimi, Saeed (Supervisor) ; Bahrani, Mohammad (Supervisor)

Abstract

This research mainly aims to propose, for the first time, a way of generating text from Abstract Meaning Representation (AMR) in Persian. AMR is a rather new way of representing the meaning of natural language sentences, that captures the various semantic components in a rooted, directed, acyclic graph. Generating text from AMR is a challenging task in natural language processing as some syntactic constructs are abstracted away from the representation, resulting in one single AMR having multiple translations. Considering many applications of generating text from meaning representations in natural language processing it seems inevitable to design some methods for converting such...

Automatic Evaluation of Machine Translation Using Abstract Meaning Representation

, M.Sc. Thesis Sharif University of Technology Sadeghieh, Hamid (Author) ; Rezae, Saeed (Supervisor) ; Bahrani, Mohammad (Supervisor)

Abstract

Machine Translation Quality Evaluation, compared to the other issues dealt with in the field of Natural Language Processing, is faced with the challenge that the repetition of the translation process from the same linguistic form in the source language will not necessarily lead to a unique linguistic form in the target language. Therefore, considering the fact that the Abstract Meaning Representation (AMR) graph is the same for all the sentences of similar meaning, this thesis has been an attempt to extend the efficiency of AMR graphs to the area of Machine Translation Quality Evaluation. The main research question dealt with in the present thesis was whether the similarity of the AMR graphs...

Ezafe Recognition Using Dependency Parsing

, M.Sc. Thesis Sharif University of Technology Nassajian, Minoo (Author) ; Bahrani, Mohammad (Supervisor) ; Shojaei, Razieh (Co-Supervisor)

Abstract

Ezafe is regarded as one of the most controversial and challenging issues in different Persian Language Processing (NLP) fields. It is recognized and pronounced but usually not written. So, this results in a high degree of ambiguity in Persian texts. Dependency grammar plays a significant role in optimization problems. So, to recognize the position of Ezafe in a sentence, this grammar is used in this current study. This method helps speed up computer operations and use low memory. Within this framework, first we take a close look at Ezafe distribution in Persian text. We use Uppsala Persian Dependency Corpus (2015) to analyze parsed sentences. The Ezafe constructions under study include...

An Automatic Semantic Tagger Based on USAS for the Persian Language

, M.Sc. Thesis Sharif University of Technology Nayeri, Negar (Author) ; Rahimi, Saeed (Supervisor) ; Bahrani, Mohammad (Supervisor)

Abstract

The emergence of lexical knowledge bases such as WordNet and FarsNet foregrounded the importance of semantic annotation of words in the areas of natural language processing and corpus linguistics. The methodology in these knowledge bases is based on semantic relations and dictionary definitions of the words in coverage. Another efficient way to perform semantic annotation is by semantically classifying the lexicon of a language in a taxonomy. In this research, we build a semantic annotation system for the semantic tagging of Persian texts. This system can be used for building tools and softwares for natural language processing in applications such as text summarization, plagiarism detection...

Introducing an Approach to Build a Phrase Structure Treebank Using Persian Dependency Treebank

, M.Sc. Thesis Sharif University of Technology Soltanzadeh, Fatemeh (Author) ; Bahrani, Mohammad (Supervisor) ; Eslami, Moharram (Co-Advisor)

Abstract

Treebanks are useful in many applications of natural language processing such as machine translation, speech recognition, information extraction, and etc. They are also being used in theoretical linguistics to study languages. For instance, they are valuable for development of different syntactic theories, calculation of frequency of syntactic rules, evaluation and comparison of statistical models and etc.
The most treebanks are based on phrase structure grammar or dependency grammar. In a phrase structure treebank, a sentence is divided to phrases. In this representation, a phrase is composed of several words. But in the dependency treebank, connection between two words is based on the...

Commonsense knowledge Extraction for Persian Language:A Combinatory Approach

, M.Sc. Thesis Sharif University of Technology Moradi, Mehdi (Author) ; Vazirnezhad, Bahram (Supervisor) ; Bahrani, Mohammad (Co-Advisor)

Abstract

Putting human commonsense knowledge into computers has always been a long standing dream of artificial intelligence (AI). Since the first days of its appearance, AI knowledge engineers were studying hard to get round this bottleneck. The cost of several tens of millions of dollars and times have been covered so that the computers could know about “objects falling, not rising.”,” running is faster than walking" And “death is the end of the life”. The large database was built, automated and semi-automated methods were introduced and volunteers’ efforts were utilized to achieve this, but an automated, high-throughput and low-noise method for commonsense collection still remains as the holy...

Automatic Blank Verse Poet Identification Using Linguistic Features

, M.Sc. Thesis Sharif University of Technology Azin, Zahra (Author) ; Bahrani, Mohammad (Supervisor) ; Khosravi Zadeh, Parvaneh (Co-Advisor)

Abstract

Author identification using statistical methods is a branch of authorship attribution which is one of important problems in natural language processing. Using different statistical methods, an anonymous text is attributed to an author. One of the primary parts of the task is to choose the appropriate stylistic features of the text in order to study the significances of style. These features must be quantitatively studied and could be extracted in lexical level, character level, and syntactic or semantic levels. The next step is text classification in which different machine learning methods such as decision tree, Artificial Neural Networks, Naïve Bayes and other methods could be used....