Sharif Digital Repository / Sharif University of Technology / Search result

Stochastic data-to-text generation using syntactic dependency information

, Article Computer Speech and Language ; Volume 76 , 2022 ; 08852308 (ISSN) Seifossadat, E ; Sameti, H ; Sharif University of Technology

Academic Press 2022

Abstract

Data-to-Text Generation (D2T) is one of the most important sub-fields of Natural Language Generation where structured data is transcribed into natural language text. Several solutions have been proposed for D2T so far with relative success, including template-based, phrase structure grammar-based, and neural attention models. However, these methods also have problems such as grammatical flaws, limited naturalness, and semantic deficiencies. In this work, we propose a stochastic corpus-based model for the data-to-text generation that produces a tree-form structure for sentences based on dependency information. This information includes the dependency relations between words and meaning labels...

Predicting the objective and priority of issue reports in software repositories

, Article Empirical Software Engineering ; Volume 27, Issue 2 , 2022 ; 13823256 (ISSN) Izadi, M ; Akbari, K ; Heydarnoori, A ; Sharif University of Technology

Springer 2022

Abstract

Software repositories such as GitHub host a large number of software entities. Developers collaboratively discuss, implement, use, and share these entities. Proper documentation plays an important role in successful software management and maintenance. Users exploit Issue Tracking Systems, a facility of software repositories, to keep track of issue reports, to manage the workload and processes, and finally, to document the highlight of their team’s effort. An issue report is a rich source of collaboratively-curated software knowledge, and can contain a reported problem, a request for new features, or merely a question about the software product. As the number of these issues increases, it...

Towards robust visual transformer networks via k-sparse attention

, Article 47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022, 23 May 2022 through 27 May 2022 ; Volume 2022-May , 2022 , Pages 4053-4057 ; 15206149 (ISSN); 9781665405409 (ISBN) Amini, S ; Ghaemmaghami, S ; Sharif University of Technology

Institute of Electrical and Electronics Engineers Inc 2022

Abstract

Transformer networks, originally developed in the community of machine translation to eliminate sequential nature of recurrent neural networks, have shown impressive results in other natural language processing and machine vision tasks. Self-attention is the core module behind visual transformers which globally mixes the image information. This module drastically reduces the intrinsic inductive bias imposed by CNNs, such as locality, while encountering insufficient robustness against some adversarial attacks. In this paper we introduce K-sparse attention to preserve low inductive bias, while robustifying transformers against adversarial attacks. We show that standard transformers attend...

Deep graph generators: A survey

, Article IEEE Access ; Volume 9 , 2021 , Pages 106675-106702 ; 21693536 (ISSN) Faez, F ; Ommi, Y ; Soleymani Baghshah, M ; Rabiee, H. R ; Sharif University of Technology

Institute of Electrical and Electronics Engineers Inc 2021

Abstract

Deep generative models have achieved great success in areas such as image, speech, and natural language processing in the past few years. Thanks to the advances in graph-based deep learning, and in particular graph representation learning, deep graph generation methods have recently emerged with new applications ranging from discovering novel molecular structures to modeling social networks. This paper conducts a comprehensive survey on deep learning-based graph generation approaches and classifies them into five broad categories, namely, autoregressive, autoencoder-based, reinforcement learning-based, adversarial, and flow-based graph generators, providing the readers a detailed description...

Event classification from the Urdu language text on social media

, Article PeerJ Computer Science ; Volume 7 , 2021 ; 23765992 (ISSN) Awan, M. D. A ; Kajla, N. I ; Firdous, A ; Husnain, M ; Missen, M. M. S ; Sharif University of Technology

PeerJ Inc 2021

Abstract

The real-time availability of the Internet has engaged millions of users around the world. The usage of regional languages is being preferred for effective and ease of communication that is causing multilingual data on social networks and news channels. People share ideas, opinions, and events that are happening globally i.e., sports, inflation, protest, explosion, and sexual assault, etc. in regional (local) languages on social media. Extraction and classification of events from multilingual data have become bottlenecks because of resource lacking. In this research paper, we presented the event classification task for the Urdu language text existing on social media and the news channels by...

Variants of vector space reductions for predicting the compositionality of English noun compounds

, Article 12th International Conference on Language Resources and Evaluation, LREC 2020, 11 May 2020 through 16 May 2020 ; 2020 , Pages 4379-4387 Alipoor, P ; Schulte im Walde, S ; Sharif University of Technology

European Language Resources Association (ELRA) 2020

Abstract

Predicting the degree of compositionality of noun compounds such as snowball and butterfly is a crucial ingredient for lexicography and Natural Language Processing applications, to know whether the compound should be treated as a whole, or through its constituents, and what it means. Computational approaches for an automatic prediction typically represent and compare compounds and their constituents within a vector space and use distributional similarity as a proxy to predict the semantic relatedness between the compounds and their constituents as the compound's degree of compositionality. This paper provides a systematic evaluation of vector-space reduction variants across kinds, exploring...

Relevant question answering in community based networks using deep LSTM neural networks

, Article 7th Iranian Joint Congress on Fuzzy and Intelligent Systems, CFIS 2019, 29 January 2019 through 31 January 2019 ; 2019 ; 9781728106731 (ISBN) Karimi, E ; Majidi, B ; Manzuri, M. T ; Sharif University of Technology

Institute of Electrical and Electronics Engineers Inc 2019

Abstract

Community based Question Answering (CQA) websites enable users to post their questions and their questions will be answered by other users. These group of social networking websites are one of the most popular websites on the Internet. The responses on these CQA websites can be for specific questions related to a specific field of interest to the users or to all kind of questions. Creating automated CQA websites is of great interest for the natural language processing research. One of task in development of automated CQA websites is finding similar questions to the question asked by the user. In this paper, a novel method for finding questions relevant questions to the question of a user...

Persian keyphrase generation using sequence-to-sequence models

, Article 27th Iranian Conference on Electrical Engineering, ICEE 2019, 30 April 2019 through 2 May 2019 ; 2019 , Pages 2010-2015 ; 9781728115085 (ISBN) Doostmohammadi, E ; Bokaei, M. H ; Sameti, H ; Sharif University of Technology

Institute of Electrical and Electronics Engineers Inc 2019

Abstract

Keyphrases are a very short summary of an input text and provide the main subjects discussed in the text. Keyphrase extraction is a useful upstream task and can be used in various natural language processing problems, for example, text summarization and information retrieval, to name a few. However, not all the keyphrases are explicitly mentioned in the body of the text. In real-world examples there are always some topics that are discussed implicitly. Extracting such keyphrases requires a generative approach, which is adopted here. In this paper, we try to tackle the problem of keyphrase generation and extraction from news articles using deep sequence-to-sequence models. These models...

PerKey: a persian news corpus for keyphrase extraction and generation

, Article 9th International Symposium on Telecommunication, IST 2018, 17 December 2018 through 19 December 2018 ; 2019 , Pages 460-465 ; 9781538682746 (ISBN) Doostmohammadi, E ; Bokaei, M. H ; Sameti, H ; Sharif University of Technology

Institute of Electrical and Electronics Engineers Inc 2019

Abstract

Keyphrases provide an extremely dense summary of a text. Such information can be used in many Natural Language Processing tasks, such as information retrieval and text summarization. Since previous studies on Persian keyword or keyphrase extraction have not published their data, the field suffers from the lack of a human extracted keyphrase dataset. In this paper, we introduce PerKey1, a corpus of 553k news articles from six Persian news websites and agencies with relatively high quality author extracted keyphrases, which is then filtered and cleaned to achieve higher quality keyphrases. The resulted data was put into human assessment to ensure the quality of the keyphrases. We also measured...

Persian word embedding evaluation benchmarks

, Article 26th Iranian Conference on Electrical Engineering, ICEE 2018, 8 May 2018 through 10 May 2018 ; 2018 , Pages 1583-1588 ; 9781538649169 (ISBN) Zahedi, M. S ; Bokaei, M. H ; Shoeleh, F ; Yadollahi, M. M ; Doostmohammadi, E ; Farhoodi, M ; Sharif University of Technology

Institute of Electrical and Electronics Engineers Inc 2018

Abstract

Recently, there has been renewed interest in semantic word representation also called word embedding, in a wide variety of natural language processing tasks requiring sophisticated semantic and syntactic information. The quality of word embedding methods is usually evaluated based on English language benchmarks. Nevertheless, only a few studies analyze word embedding for low resource languages such as Persian. In this paper, we perform such an extensive word embedding evaluation in Persian language based on a set of lexical semantics tasks named analogy, concept categorization, and word semantic relatedness. For these evaluation tasks, we provide three benchmark data sets to show the...

Inbound e-marketing using neural network based visual and phonetic user experience analytics

, Article 2018 4th International Conference on Web Research, ICWR 2018 ; 15 June , 2018 , Pages 12-18 ; 9781538653647 (ISBN) Nedaei, D ; Khanzadi, P ; Majidi, B ; Movaghar, A ; Sharif University of Technology

Institute of Electrical and Electronics Engineers Inc 2018

Abstract

Inbound marketing is the process of attracting the probable customers to a business before they have any intention to become customers. An effective method for inbound marketing is creation of a positive psychological business environment to attract the customers. A significant portion of traditional business environment is moving online and the new business environment is the company website. One of the major elements in online inbound marketing is the website address and the website logo, which are the first factors of brand personality that the visitor to the company website encounters when looking up the website in a search engine. In this paper, a framework for inbound e-marketing using...

Persian pronoun resolution using data driven approaches

, Article 23rd International Conference on Information and Software Technologies, ICIST 2017, 12 October 2017 through 14 October 2017 ; Volume 756 , 2017 , Pages 574-585 ; 18650929 (ISSN); 9783319676418 (ISBN) Nourbakhsh, A ; Bahrani, M ; Sharif University of Technology

Springer Verlag 2017

Abstract

Pronoun resolution is one of the challenges of natural language processing (NLP). The proposed solutions range from heuristic rule-based to machine learning data driven approaches. In this article, we follow a previous machine learning approach on Persian pronoun anaphora resolution. The primary goal of this paper is to improve the results, mainly by extracting more balanced data through using heuristic rules in instance sampling, and utilizing more relevant features in classification. Using PCAC2008 dataset, we consider noun phrase structure as a way to extract more suitable training data. Incorporated features include syntactic and semantic information. Finally, we train and test different...

Election vote share prediction using a sentiment-based fusion of Twitter data with Google trends and online polls

, Article 6th International Conference on Data Science, Technology and Applications, DATA 2017, 24 July 2017 through 26 July 2017 ; 2017 , Pages 363-370 ; 9789897582554 (ISBN) Kassraie, P ; Modirshanechi, A ; Aghajan, H. K ; Institute for Systems and Technologies of Information, Control and Communication (INSTICC) ; Sharif University of Technology

SciTePress 2017

Abstract

It is common to use online social content for analyzing political events. Twitter-based data by itself is not necessarily a representative sample of the society due to non-uniform participation. This fact should be noticed when predicting real-world events from social media trends. Moreover, each tweet may bare a positive or negative sentiment towards the subject, which needs to be taken into account. By gathering a large dataset of more than 370,000 tweets on 2016 US Elections and carefully validating the resulting key trends against Google Trends, a legitimate dataset is created. A Gaussian process regression model is used to predict the election outcome; we bring in the novel idea of...

History based unsupervised data oriented parsing

, Article International Conference Recent Advances in Natural Language Processing, RANLP ; September , 2013 , Pages 453-459 ; 13138502 (ISSN) Mesgar, M ; Ghasem Sani, G ; Sharif University of Technology

2013

Abstract

Grammar induction is a basic step in natural language processing. Based on the volume of information that is used by different methods, we can distinguish three types of grammar induction method: supervised, unsupervised, and semi-supervised. Supervised and semisupervised methods require large tree banks, which may not currently exist for many languages. Accordingly, many researchers have focused on unsupervised methods. Unsupervised Data Oriented Parsing (UDOP) is currently the state of the art in unsupervised grammar induction. In this paper, we show that the performance of UDOP in free word order languages such as Persian is inferior to that of fixed order languages such as English. We...

Temporal relation classification in Persian and english contexts

, Article International Conference Recent Advances in Natural Language Processing, RANLP, Hissar ; September , 2013 , Pages 261-269 ; 13138502 (ISSN) Torbati, M. E ; Ghassem-Sani, G ; Mirroshandel, S. A ; Yaghoobzadeh, Y ; Hosseini, N. K ; Sharif University of Technology

2013

Abstract

This paper introduces the first pattern-based Persian Temporal Relation Classifier (PTRC) that finds the type of temporal relations between pairs of events in the Persian texts. The proposed system uses support vector machines (SVMs) equipped by combinations of simple, convolution tree, and string subsequence kernels (SSK). In order to evaluate the algorithm, we have developed a Persian TimeBank (PTB) corpus. PTRC not only increases the performance of the classification by applying new features and SSK, but also alleviates the probable adverse effects of the Free Word Orderness (FWO) of Persian on temporal relation classification. We have also applied our proposed algorithm to two standard...

Unsupervised induction of persian semantic verb classes based on syntactic information

, Article Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Warsaw ; Volume 7912 LNCS , June , 2013 , Pages 112-124 ; 03029743 (ISSN) ; 9783642386336 (ISBN) Aminian, M ; Rasooli, M. S ; Sameti, H ; Sharif University of Technology

2013

Abstract

Automatic induction of semantic verb classes is one of the most challenging tasks in computational lexical semantics with a wide variety of applications in natural language processing. The large number of Persian speakers and the lack of such semantic classes for Persian verbs have motivated us to use unsupervised algorithms for Persian verb clustering. In this paper, we have done experiments on inducing the semantic classes of Persian verbs based on Levin's theory for verb classes. Syntactic information extracted from dependency trees is used as base features for clustering the verbs. Since there has been no manual classification of Persian verbs prior to this paper, we have prepared a...

Towards unsupervised learning of temporal relations between events

, Article Journal of Artificial Intelligence Research ; Volume 45 , 2012 , Pages 125-163 ; 10769757 (ISSN) Mirroshandel, S. A ; Ghassem Sani, G ; Sharif University of Technology

2012

Abstract

Automatic extraction of temporal relations between event pairs is an important task for several natural language processing applications such as Question Answering, Information Extraction, and Summarization. Since most existing methods are supervised and require large corpora, which for many languages do not exist, we have concentrated our efforts to reduce the need for annotated data as much as possible. This paper presents two different algorithms towards this goal. The first algorithm is a weakly supervised machine learning approach for classification of temporal relations between events. In the first stage, the algorithm learns a general classifier from an annotated corpus. Then,...

ISO-TimeML event extraction in persian text

, Article 24th International Conference on Computational Linguistics - Proceedings of COLING 2012: Technical Papers, 8 December 2012 through 15 December 2012 ; December , 2012 , Pages 2931-2944 Yaghoobzadeh, Y ; Ghassem-Sani, G ; Mirroshandel, S. A ; Eshaghzadeh, M ; Sharif University of Technology

2012

Abstract

Recognizing TimeML events and identifying their attributes, are important tasks in natural language processing (NLP). Several NLP applications like question answering, information retrieval, summarization, and temporal information extraction need to have some knowledge about events of the input documents. Existing methods developed for this task are restricted to limited number of languages, and for many other languages including Persian, there has not been any effort yet. In this paper, we introduce two different approaches for automatic event recognition and classification in Persian. For this purpose, a corpus of events has been built based on a specific version of ISO-TimeML for Persian....

Formal verification of temporal questions in the context of query-answering text summarization

, Article Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 28 May 2012 through 30 May 2012 ; Volume 7310 LNAI , May , 2012 , Pages 350-355 ; 03029743 (ISSN) ; 9783642303524 (ISBN) Mostafazadeh, N ; Bakhshandeh Babarsad, O ; Ghassem Sani, G ; Sharif University of Technology

2012

Abstract

This paper presents a novel method for answering complex temporal ordering questions in the context of an event and query-based text summarization. This task is accomplished by precisely mapping the problem of "query-based summarization of temporal ordering questions" in the field of Natural Language Processing to "verifying a finite state model against a temporal formula" in the realm of Model Checking. This mapping requires specific definitions, structures, and procedures. The output of this new approach is promisingly a readable and informative summary satisfying the user's needs

Unilateral semi-supervised learning of extended hidden vector state for Persian language understanding

, Article NLP-KE 2011 - Proceedings of the 7th International Conference on Natural Language Processing and Knowledge Engineering, 27 November 2011 through 29 November 2011, Tokushima ; 2011 , Pages 165-168 ; 9781612847283 (ISBN) Jabbari, F ; Sameti, H ; Bokaei, M. H ; Chinese Association for Artificial Intelligence; IEEE Signal Processing Society ; Sharif University of Technology

2011

Abstract

The key element of a spoken dialogue system is Spoken Language Understanding (SLU) part. HVS and EHVS are two most popular statistical methods employed to implement the SLU part which need lightly annotated data. Since annotation is a time consuming, we present a novel semi-supervised learning for EHVS to reduce the human labeling effort using two different statistical classifiers, SVM and KNN. Experiments are done on a Persian corpus, the University Information Kiosk corpus. The experimental results show improvements in performance of semi-supervised EHVS, trained by both labeled and unlabeled data, compared to EHVS trained by just initially labeled data. The performance of EHVS improves...