Loading...
Search for: word2vec
0.01 seconds

    Doc2vec Natural Language Model of Farsi

    , M.Sc. Thesis Sharif University of Technology Fazeli, Mohammad (Author) ; Moghadasi, Reza (Supervisor)
    Abstract
    Due to immense increase in availability of text data, interest in using machine learning models to solve problems previously impossibly costly has increased significantly. The first step is to represent natural language in a form that is easy for the machine learning algorithms to work on. Recent advances in learned representation of text data using simple neural networks(e.g. word2vec and doc2vec) helped increase performance of natural language processing on downstream tasks. Here we show that methods like doc2vec that were examined mostly in the English language can be used on Persian(Farsi) with little modification. To Demonstrate this, we use text classification tasks, and train... 

    Persian word embedding evaluation benchmarks

    , Article 26th Iranian Conference on Electrical Engineering, ICEE 2018, 8 May 2018 through 10 May 2018 ; 2018 , Pages 1583-1588 ; 9781538649169 (ISBN) Zahedi, M. S ; Bokaei, M. H ; Shoeleh, F ; Yadollahi, M. M ; Doostmohammadi, E ; Farhoodi, M ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2018
    Abstract
    Recently, there has been renewed interest in semantic word representation also called word embedding, in a wide variety of natural language processing tasks requiring sophisticated semantic and syntactic information. The quality of word embedding methods is usually evaluated based on English language benchmarks. Nevertheless, only a few studies analyze word embedding for low resource languages such as Persian. In this paper, we perform such an extensive word embedding evaluation in Persian language based on a set of lexical semantics tasks named analogy, concept categorization, and word semantic relatedness. For these evaluation tasks, we provide three benchmark data sets to show the...