Loading...
- Type of Document: M.Sc. Thesis
- Language: Farsi
- Document No: 52981 (31)
- University: Sharif University of Technology
- Department: Languages and Linguistics Center
- Advisor(s): Sameti, Hossein; Bokaei, Mohammad Hadi
- Abstract:
- In recent years, word embeddings as the word representation have captured the attention of natural language processing (NLP) researches. One of the great advantages of word embeddings is their capability in representing the relationships of the words. Therefore, using word embeddings in NLP applications results in better performance.Despite widespread attention towards word embedding in late years, Persian word embeddings have not achieved sensible progress. One of the Persian word embeddings difficulties is related to that, Persian is a low-resource language in comparison with worldwide languages. Therefore, Persian word embedding quality is lower than English. Consequently, the accuracy of NLP applications that utilize word embeddings as the feature vectors are lower than the same applications in high resource languages.The main objective of this thesis is improving Persian word embedding accuracy using cross-lingual approaches. In these methods, Persian word embeddings are aligned to English ones using a small bilingual dictionary to generate the transformation matrix. Concerning to higher English word embeddings quality, applying the alignment to whole Persian word embeddings results in higher Persian word embeddings quality. MUSE and VecMap are used to align Persian and English in a joint space. According to evaluations, these methods make improvements in analogy and concept categorization tasks. To be more precise, Persian VecMap aligned word embeddings have a growth of 2% and 1.8% in analogy and categorization tasks respectively. Furthermore, English and Persian aligned word embeddings are utilized in a sentiment analysis model. The model is trained on English data and tested on Persian data. The model F1-score is 78.18%
- Keywords:
- Natural Language Processing ; Word Embedding ; Neural Networks ; Cross Lingual Speaker Adaptation
-
محتواي کتاب
- view