Automatic Blank Verse Poet Identification Using Linguistic Features

Azin, Zahra; Bahrani, Mohammad  Khosravi Zadeh, Parvaneh

Please enable javascript in your browser.

Automatic Blank Verse Poet Identification Using Linguistic Features

Azin, Zahra | 2014

647 Viewed

Type of Document: M.Sc. Thesis
Language: Farsi
Document No: 45477 (31)
University: Sharif University of Technology
Department: Language and Linguistics
Advisor(s): Bahrani, Mohammad; Khosravi Zadeh, Parvaneh
Abstract:
Author identification using statistical methods is a branch of authorship attribution which is one of important problems in natural language processing. Using different statistical methods, an anonymous text is attributed to an author. One of the primary parts of the task is to choose the appropriate stylistic features of the text in order to study the significances of style. These features must be quantitatively studied and could be extracted in lexical level, character level, and syntactic or semantic levels. The next step is text classification in which different machine learning methods such as decision tree, Artificial Neural Networks, Naïve Bayes and other methods could be used.
In this research, the main purpose is to determine the most efficient stylistic features in Persian texts in order to use in automatic authorship attribution systems. To fulfill this aim, we studied the stylistic features of four Persian blank verse poets including Mehdi Akhavan, Nima Yushij, Ahmad Shamlou, and Sohrab Sepehri. The features were studied in three lexical, character, and syntactic levels. Then, three machine learning methods (Naïve Bayes, SVM, and KNN) were used to classify the documents.
Finally, we concluded that syntactic features are the most efficient ones. After merging all the vectors into one, the rate of precision improved. On the other hand, the SVM method with maximum F-measure of 96% was the most efficient classifier in this field
Keywords:
Natural Language Processing ; Feature Extraction ; Text Categorization ; Author Identification

Digital Object List

محتواي کتاب
view

Bookmark

No TOC

Friend's email
Your name
Your email
enter code