Loading...

Designing and Implementation of a Morphological Analyzer for Central Kurdish Language

Naserzadeh, Morteza | 2018

866 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 50993 (31)
  4. University: Sharif University of Technology
  5. Department: Languages and Linguistics Center
  6. Advisor(s): Khosravizadeh, Parvaneh; Veisi, Hadi
  7. Abstract:
  8. Morphological analysis of words plays a fundamental role in natural language processing applications such as spell checker, POS-tagging and machine translation. Among different dialects belonging to Kurdish, Northern Kurdish (Kurmanji), Central Kurdish (Sorani) and Southern Kurdish are the closest ones to each other. The central dialect of Kurdish is the second official language in Iraq and a large volume of text content is producing daily in this dialect. In this research, designing and implementation of a two-level morphological analyzer for the central Kurdish dialect is introduced. This analyzer is designed using an open source toolkit named HFST, which is a finite state transducer morphological analyzer. For the implementation of the analyzer about 3,000 simple and prefixed verbs, 7,000 nouns, 2000 adjectives, 2800 past particles, and more than 1000 other words have been collected. The verbs have been categorized into three groups: intransitive verbs, transitive verbs and onomatopoeic verbs. In the lexicon, in addition to "past stems" and "present stems" for transitive verbs, "passive-maker" stems has also been presented. Also, all the morphotactic and morphophonological rules of central Kurdish dialect have been collected and implemented. The analyzer is able to break down word forms into all possible morphemes, identifying the role of each of these components. With current lexicon size, this analyzer can analyze the possible POS tag of a word form with precision score of 96.0%, recall score of 88.3%, and F-measure of 92.0%, and analyzes the correctness of spelling with precision score of 99.9%, recall score of 87.4% and F-measure of 93.2%
  9. Keywords:
  10. Morphology ; Morphological Analyzer ; Inflection ; Central Kwrdish Language ; Elitic ; Finite State Transducer ; Two-levels Morphology

 Digital Object List

 Bookmark

No TOC