Loading...
Search for: low-resource-languages
0.006 seconds

    Persian word embedding evaluation benchmarks

    , Article 26th Iranian Conference on Electrical Engineering, ICEE 2018, 8 May 2018 through 10 May 2018 ; 2018 , Pages 1583-1588 ; 9781538649169 (ISBN) Zahedi, M. S ; Bokaei, M. H ; Shoeleh, F ; Yadollahi, M. M ; Doostmohammadi, E ; Farhoodi, M ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2018
    Abstract
    Recently, there has been renewed interest in semantic word representation also called word embedding, in a wide variety of natural language processing tasks requiring sophisticated semantic and syntactic information. The quality of word embedding methods is usually evaluated based on English language benchmarks. Nevertheless, only a few studies analyze word embedding for low resource languages such as Persian. In this paper, we perform such an extensive word embedding evaluation in Persian language based on a set of lexical semantics tasks named analogy, concept categorization, and word semantic relatedness. For these evaluation tasks, we provide three benchmark data sets to show the... 

    Exploring the impact of machine translation on fake news detection: A case study on Persian tweets about COVID-19

    , Article 29th Iranian Conference on Electrical Engineering, ICEE 2021, 18 May 2021 through 20 May 2021 ; 2021 , Pages 540-544 ; 9781665433655 (ISBN) Saghayan, M. H ; Ebrahimi, S. F ; Bahrani, M ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2021
    Abstract
    Fake news detection has become an emerging and critical topic of research in recent years. One of the major complications of fake news detection lies in the fact that news in social networks is multilingual, and therefore developing methods for each and every language in the world is impossible, especially for low resource languages like Persian. In an effort to solve this problem, researchers use machine translation to uniform the data and develop a method for the uniformed data. In this paper, we aim to explore the impacts of machine translation on fake news detection. For this purpose, we extracted and labeled a dataset of Persian Tweets from Twitter on the subject of COVID-19 and...