Loading...
Rule-Based Conversion of Colloquial Texts into Official Texts in Persian
Rajabpur, Mohammad | 2018
654
Viewed
- Type of Document: M.Sc. Thesis
- Language: Farsi
- Document No: 50709 (31)
- University: Sharif University of Technology
- Department: Languages and Linguistics Center
- Advisor(s): Bahrani, Mohammad
- Abstract:
- In this study, first a set of data was colleted which consisted of colloquial sentences in Persian. Each of these sentences was rendered into standard Persian by native speakers. As a result, a corpus of parallel data including 1698 pairs of sentences was created. Then each colloquial sentence and its formal equivalent were converted into term-frequency vectors and the cosine distance similarity between the two vectors was calculated. Besides the mean and the standard deviation of all cosine distances were obtained. Afterwards the whole set of data was divided into two halves through Stratified randomization so that the two halves resembled each other in terms of cosine distance similarity. The first half was used to extract the rules and the second half was used in the evaluation of the system. The most productive rules involved the conversion of verb forms based on their roots, inflectional suffixes, and clitics and the conversion of nonverbial forms through consulting a lexicon of formal word forms. The exceptions, the irregularities and the conversions through unproductive rules were included in a look-up table. Subsequently the algorithm of rule-based conversion of colloquial sentences into formal sentences was designed and implemented. Eventually the colloquial sentences of the testing half of the data were fed into the system. For each colloquial sentence, the converted formal output was automatically compared and contrasted with the human rendering of the same sentence. The results demonstrate that the mean of cosine distances increased from a baseline of 0.531 to 0.842 and the mean of Bleu precision scores increased from a baseline of 0.520 to 0.801
- Keywords:
- Normalization ; Computational Linguistics ; Rule-Based Aproach ; Colloquial Texts ; Texts Automatic Conversion ; Interlanguage Machine Translation ; Formal Persian
-
محتواي کتاب
- view