Loading...

Automatic Blank Verse Poet Identification Using Linguistic Features

Azin, Zahra | 2014

647 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 45477 (31)
  4. University: Sharif University of Technology
  5. Department: Language and Linguistics
  6. Advisor(s): Bahrani, Mohammad; Khosravi Zadeh, Parvaneh
  7. Abstract:
  8. Author identification using statistical methods is a branch of authorship attribution which is one of important problems in natural language processing. Using different statistical methods, an anonymous text is attributed to an author. One of the primary parts of the task is to choose the appropriate stylistic features of the text in order to study the significances of style. These features must be quantitatively studied and could be extracted in lexical level, character level, and syntactic or semantic levels. The next step is text classification in which different machine learning methods such as decision tree, Artificial Neural Networks, Naïve Bayes and other methods could be used.
    In this research, the main purpose is to determine the most efficient stylistic features in Persian texts in order to use in automatic authorship attribution systems. To fulfill this aim, we studied the stylistic features of four Persian blank verse poets including Mehdi Akhavan, Nima Yushij, Ahmad Shamlou, and Sohrab Sepehri. The features were studied in three lexical, character, and syntactic levels. Then, three machine learning methods (Naïve Bayes, SVM, and KNN) were used to classify the documents.
    Finally, we concluded that syntactic features are the most efficient ones. After merging all the vectors into one, the rate of precision improved. On the other hand, the SVM method with maximum F-measure of 96% was the most efficient classifier in this field
  9. Keywords:
  10. Natural Language Processing ; Feature Extraction ; Text Categorization ; Author Identification

 Digital Object List

 Bookmark

No TOC