Loading...
Search for: machin-translation
0.009 seconds

    PEN: Parallel English-Persian news corpus

    , Article Proceedings of the 2011 International Conference on Artificial Intelligence, ICAI 2011, 18 2011 through 21 July 2011 ; Volume 2 , July , 2011 , Pages 523-528 ; 9781601321855 (ISBN) Farajian, M. A ; ICAI 2011
    2011
    Abstract
    Parallel corpora are the necessary resources in many multilingual natural language processing applications, including machine translation and cross-lingual information retrieval. Manual preparation of a large scale parallel corpus is a very time consuming and costly procedure. In this paper, the work towards building a sentence-level aligned English-Persian corpus in a semi-automated manner is presented. The design of the corpus, collection, and alignment process of the sentences is described. Two statistical similarity measures were used to find the similarities of sentence pairs. To verify the alignment process automatically, Google Translator was used. The corpus is based on news... 

    Automatic Evaluation of Machine Translation Using Abstract Meaning Representation

    , M.Sc. Thesis Sharif University of Technology Sadeghieh, Hamid (Author) ; Rezae, Saeed (Supervisor) ; Bahrani, Mohammad (Supervisor)
    Abstract
    Machine Translation Quality Evaluation, compared to the other issues dealt with in the field of Natural Language Processing, is faced with the challenge that the repetition of the translation process from the same linguistic form in the source language will not necessarily lead to a unique linguistic form in the target language. Therefore, considering the fact that the Abstract Meaning Representation (AMR) graph is the same for all the sentences of similar meaning, this thesis has been an attempt to extend the efficiency of AMR graphs to the area of Machine Translation Quality Evaluation. The main research question dealt with in the present thesis was whether the similarity of the AMR graphs... 

    A study to find influential parameters on a Farsi-English statistical machine translation system

    , Article 2010 5th International Symposium on Telecommunications, IST 2010, 4 December 2010 through 6 December 2010 ; 2010 , Pages 985-991 ; 9781424481835 (ISBN) Bakhshaei, S ; Khadivi, S ; Riahi, N ; Sameti, H ; Sharif University of Technology
    Abstract
    The aim of this paper is to analyze the Farsi-English statistical machine translation systems as a useful communication tool. Improvement of the nation's communication increases the need of easier way of translating between different languages in front of expensive human translators. In this work, a statistical phrase-based system is run on Farsi - English pair languages and the effect of its parameters on the translation quality has been deeply studied. Using BLEU as a metric of translation accuracy, the system achieves an improvement of 1.84%, relative to the baseline accuracy, which is increment from 16.97% to 18.81% in the best case  

    Exploring the impact of machine translation on fake news detection: A case study on Persian tweets about COVID-19

    , Article 29th Iranian Conference on Electrical Engineering, ICEE 2021, 18 May 2021 through 20 May 2021 ; 2021 , Pages 540-544 ; 9781665433655 (ISBN) Saghayan, M. H ; Ebrahimi, S. F ; Bahrani, M ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2021
    Abstract
    Fake news detection has become an emerging and critical topic of research in recent years. One of the major complications of fake news detection lies in the fact that news in social networks is multilingual, and therefore developing methods for each and every language in the world is impossible, especially for low resource languages like Persian. In an effort to solve this problem, researchers use machine translation to uniform the data and develop a method for the uniformed data. In this paper, we aim to explore the impacts of machine translation on fake news detection. For this purpose, we extracted and labeled a dataset of Persian Tweets from Twitter on the subject of COVID-19 and... 

    Design and Development of a Persian to English Translator Prototype

    , M.Sc. Thesis Sharif University of Technology Niknejad, Ali (Author) ; Ghasem Sani, Gholamreza (Supervisor)
    Abstract
    Increasing relations between different cultures necessitates easier and more affordable methods of translation between different languages. Hence, using computers as Translators has been very attractive to many governmental and commercial organizations as well as scientific community since the very beginning of the computer era. So far, different approaches to MT have been proposed. Two of the main approaches to MT are Statistical MT and Rule-Based MT. Unlike Statistical MT, which uses statistical information for translation, Rule-Based MT utilizes precise linguistic information to understand the source and generate the target language. This inguisitic information is usually encoded as a... 

    Implementation of a Statistical Persian-English Translator Prototype

    , M.Sc. Thesis Sharif University of Technology Alizadeh, Yousef (Author) ; Ghasem Sani, Gholamreza (Supervisor)
    Abstract
    Machine translation has been an important subject in the field of natural language processing (NLP). In recent years, because of providing essential linguistic data resources, statistical approached have been deployed in machine translation. Although there have been several attempt to create English to Persian automatic translator, there has not been sufficient effort in the reverse direction. In this project, we reviewed previous works in machine translator for Persian and implemented a statistical machine translator from Persian to English. We needed a bilingual corpus for building the translator. For this purpose, we used a corpus of Phd and MSc abstracts in Persian and their translation... 

    Designing a Hybrid Approach to Persian-English Machine Translation

    , M.Sc. Thesis Sharif University of Technology Mohammadifar, Davood (Author) ; Ghasem Sani, Gholamreza (Supervisor)
    Abstract
    Nowadays, because of growing web and consequently increasing data in different languages, the need for machine translation is inevitable. Machine translators are created to speed up the translation process. Machine translation methods are generally divided into three categories: rule-based, corpus-based, and hybrid. Rule-based machine translation uses grammar for translation, but it needs a complete grammar of language for correct translation. Corpus-based method has many variations. One of those variations is the statistical machine translation which uses probabilistic and statistical rules for translation and nowadays is frequently used. Hybrid machine translation benefits from the... 

    Discriminative spoken language understanding using statistical machine translation alignment models

    , Article Communications in Computer and Information Science ; Vol. 427, issue , Sep , 2014 , pp. 194-202 ; ISSN: 18650929 ; ISBN: 9783319108490 Aliannejadi, M ; Khadivi, S ; Ghidary, S. S ; Bokaei, M. H ; Sharif University of Technology
    Abstract
    In this paper, we study the discriminative modeling of Spoken Language Understanding (SLU) using Conditional Random Fields (CRF) and Statistical Machine Translation (SMT) alignment models. Previous discriminative approaches to SLU have been dependent on n-gram features. Other previous works have used SMT alignment models to predict the output labels. We have used SMT alignment models to align the abstract labels and trained CRF to predict the labels. We show that the state transition features improve the performance. Furthermore, we have compared the proposed method with two baseline approaches; Hidden Vector States (HVS) and baseline-CRF. The results show that for the F-measure the proposed... 

    A Hybrid Approach for Normalization of Non-Standard Persian Texts

    , M.Sc. Thesis Sharif University of Technology Rostami, Ramtin (Author) ; Sameti, Hossein (Supervisor) ; Ghasem-Sani, Gholamreza (Co-Advisor)
    Abstract
    With the increase of internet usage and the volume of available data, the need for data mining and text processing is felt. One of the common obstacles for using these methods is usage of colloquial and non-standard language in writings. Due to this fact, combined with the fact that NLP tasks in Persian language had always faced data shortage issues, in this thesis, we first collect and construct a parallel data set, consisting of colloquial texts used in social media. Then after examining various methods used in other languages for text normalization, we propose a combination of new hybrid methods, involving Statistical Machine Translation methodology with some modification, to normalize... 

    Rule-Based Conversion of Colloquial Texts into Official Texts in Persian

    , M.Sc. Thesis Sharif University of Technology Rajabpur, Mohammad (Author) ; Bahrani, Mohammad (Supervisor)
    Abstract
    In this study, first a set of data was colleted which consisted of colloquial sentences in Persian. Each of these sentences was rendered into standard Persian by native speakers. As a result, a corpus of parallel data including 1698 pairs of sentences was created. Then each colloquial sentence and its formal equivalent were converted into term-frequency vectors and the cosine distance similarity between the two vectors was calculated. Besides the mean and the standard deviation of all cosine distances were obtained. Afterwards the whole set of data was divided into two halves through Stratified randomization so that the two halves resembled each other in terms of cosine distance similarity.... 

    Towards robust visual transformer networks via k-sparse attention

    , Article 47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022, 23 May 2022 through 27 May 2022 ; Volume 2022-May , 2022 , Pages 4053-4057 ; 15206149 (ISSN); 9781665405409 (ISBN) Amini, S ; Ghaemmaghami, S ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2022
    Abstract
    Transformer networks, originally developed in the community of machine translation to eliminate sequential nature of recurrent neural networks, have shown impressive results in other natural language processing and machine vision tasks. Self-attention is the core module behind visual transformers which globally mixes the image information. This module drastically reduces the intrinsic inductive bias imposed by CNNs, such as locality, while encountering insufficient robustness against some adversarial attacks. In this paper we introduce K-sparse attention to preserve low inductive bias, while robustifying transformers against adversarial attacks. We show that standard transformers attend... 

    Using ASR methods for OCR

    , Article 15th IAPR International Conference on Document Analysis and Recognition, ICDAR 2019, 20 September 2019 through 25 September 2019 ; 2019 , Pages 663-668 ; 15205363 (ISSN); 9781728128610 (ISBN) Arora, A ; Garcia, P ; Watanabe, S ; Manohar, V ; Shao, Y ; Khudanpur, S ; Chang, C. C ; Rekabdar, B ; Babaali, B ; Povey, D ; Etter, D ; Raj, D ; Hadian, H ; Trmal, J ; Sharif University of Technology
    IEEE Computer Society  2019
    Abstract
    Hybrid deep neural network hidden Markov models (DNN-HMM) have achieved impressive results on large vocabulary continuous speech recognition (LVCSR) tasks. However, the recent approaches using DNN-HMM models are not explored much for text recognition. Inspired by the current work in automatic speech recognition (ASR) and machine translation, we present an open vocabulary sub-word text recognition system. The sub-word lexicon and sub-word language model (LM) helps in overcoming the challenge of recognizing out of vocabulary (OOV) words, and a time delay neural network (TDNN) and convolution neural network (CNN) based DNN-HMM optical model (OM) efficiently models the sequence dependency in the... 

    Generating summaries for methods of event-driven programs: An Android case study

    , Article Journal of Systems and Software ; Volume 170 , 2020 Aghamohammadi, A ; Izadi, M ; Heydarnoori, A ; Sharif University of Technology
    Elsevier Inc  2020
    Abstract
    The lack of proper documentation makes program comprehension a cumbersome process for developers. Source code summarization is one of the existing solutions to this problem. Many approaches have been proposed to summarize source code in recent years. A prevalent weakness of these solutions is that they do not pay much attention to interactions among elements of software. An element is simply a callable code snippet such as a method or even a clickable button. As a result, these approaches cannot be applied to event-driven programs, such as Android applications, because they have specific features such as numerous interactions between their elements. To tackle this problem, we propose a novel...