Sharif Digital Repository / Sharif University of Technology / Search result

Taxonomy construction using compound similarity measure

, Article OTM Confederated International Conferences CoopIS, DOA, ODBASE, GADA, and IS 2007, Vilamoura, 25 November 2007 through 30 November 2007 ; Volume 4803 LNCS, Issue PART 1 , 2007 , Pages 915-932 ; 03029743 (ISSN); 9783540768463 (ISBN) Neshati, M ; Hassanabadi, L. S ; Sharif University of Technology

Springer Verlag 2007

Abstract

Taxonomy learning is one of the major steps in ontology learning process. Manual construction of taxonomies is a time-consuming and cumbersome task. Recently many researchers have focused on automatic taxonomy learning, but still quality of generated taxonomies is not satisfactory. In this paper we have proposed a new compound similarity measure. This measure is based on both knowledge poor and knowledge rich approaches to find word similarity. We also used Neural Network model for combination of several similarity methods. We have compared our method with simple syntactic similarity measure. Our measure considerably improves the precision and recall of automatic generated taxonomies. ©...

Collecting positive instances of "instance-of" relationship in the Persian language

, Article ICECT 2010 - Proceedings of the 2010 2nd International Conference on Electronic Computer Technology, 7 May 2010 through 10 May 2010, Kuala Lumpur ; May , 2010 , Pages 46-49 ; 9781424474059 (ISBN) Rastegari, Y ; Abolhassani, H ; Zibanezhad, B ; Sayadiharikandeh, M ; Sharif University of Technology

2010

Abstract

Fetching Lexico-Syntactic patterns from text rely on pairs of words (positive instances) that represent the target relation, and finding their simultaneous occurrence in text corpus. Due to existence of WordNet thesaurus (which contains the semantic relationship between words), collecting positive instances is easy. In non-english languages, it's hard to collect large number of positive instances in various contexts. We investigated some new ideas for collecting them in Persian language and finally run the best one and collected approximately 6,000 positive instances

One step toward a richer model of unsupervised grammar induction

, Article International Conference on Recent Advances in Natural Language Processing, RANLP 2005, 21 September 2005 through 23 September 2005 ; Volume 2005-January , 2005 , Pages 197-203 ; 13138502 (ISSN) ; 9549174336 (ISBN) Feili, H ; Ghassem Sani, G. R ; Angelova G ; Bontcheva K ; Mitkov R ; Nicolov N ; Nikolov N ; Sharif University of Technology

Association for Computational Linguistics (ACL) 2005

Abstract

Probabilistic Context-Free Grammars (PCFGs) are useful tools for syntactic analysis of natural languages. Availability of large Treebank has encouraged many researchers to use PCFG in language modeling. Automatic learning of PCFGs is divided into three different categories, based on the needed data set for the training phase: supervised, semi-supervised and unsupervised. Most current inductive methods are supervised, which need a bracketed data set in the training phase. However, lack of this kind of data set in many languages, has encouraged us to pay more attention to unsupervised approaches. So far, unsupervised approaches have achieved little success. By considering a history-based...

How parts of speech are learned? A lexical-driven or a structure-driven model

, Article Procedia - Social and Behavioral Sciences ; Volume 32 , 2012 , Pages 275-282 ; 18770428 (ISSN) Khosravizadeh, P ; Pashmforoosh, R ; Sharif University of Technology

2012

Abstract

The paper investigates the possible facilitative approaches to parts of speech learning and the ways of interpreting them as functional categories rather than merely syntactic units. Participants in this study were 38 students of General English Course at Sharif University of Technology. They received a treatment in a way that the comparison group was given a text, word forms chart, and pattern practice exercises and the control group was deprived of receiving a text. The paper concludes with the emphasis on the interactive model of contextualization of the lexical categories and the intentional commitment of grammatical items to memory

Syntactic tree kernels for event-time temporal relation learning

, Article Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) ; Volume 6562 LNAI , 2011 , Pages 213-223 ; 03029743 (ISSN) ; 9783642200946 (ISBN) Mirroshandel, S. A ; Khayyamian, M ; Ghassem Sani, G ; Sharif University of Technology

Abstract

Temporal relation classification is one of the contemporary demanding tasks in natural language processing. This task can be used in various applications such as question answering, summarization, and language specific information retrieval. In this paper, we propose an improved algorithm for classifying temporal relations between events and times, using support vector machines (SVM). Along with gold-standard corpus features, the proposed method aims at exploiting useful syntactic features, which are automatically generated, to improve accuracy of the classification. Accordingly, a number of novel kernel functions are introduced and evaluated for temporal relation classification. The result...

Stochastic data-to-text generation using syntactic dependency information

, Article Computer Speech and Language ; Volume 76 , 2022 ; 08852308 (ISSN) Seifossadat, E ; Sameti, H ; Sharif University of Technology

Academic Press 2022

Abstract

Data-to-Text Generation (D2T) is one of the most important sub-fields of Natural Language Generation where structured data is transcribed into natural language text. Several solutions have been proposed for D2T so far with relative success, including template-based, phrase structure grammar-based, and neural attention models. However, these methods also have problems such as grammatical flaws, limited naturalness, and semantic deficiencies. In this work, we propose a stochastic corpus-based model for the data-to-text generation that produces a tree-form structure for sentences based on dependency information. This information includes the dependency relations between words and meaning labels...

Using Structural Language Modeling in Continous Speech Recognition Systems

, M.Sc. Thesis Sharif University of Technology SheikhShab, Golnar (Author) ; Sameti, Hossein (Supervisor)

Abstract

Language model is one of the most important parsts of an automated speech recognition system whiche incorporates the knowledge of Natural Language to the system to improve its accuracy. Conventionally used language model in recognition systems is ngram which usually is extracted from a large corpus using related frequency method. ngram model approximates the probability of a word sequence by multiplying its ngram probabilities and thus does not take into account the long distance dependencies. So, syntactic language models could be of interest. In this research after probing different syntactic language models, a mehtod for re-estimating ngram model, introduced by Stolcke in 1994, was...

محتواي پايان نامه

Semantic Approach to the Scientific Theories and Its Realistic or Anti Realistic Consequences

, M.Sc. Thesis Sharif University of Technology Hosseini Ghourtani, Ahmad (Author) ; Akbari Takhtameshlou, Javad (Supervisor)

Abstract

Different views and approaches have been adopted regarding the nature of scientific theories in the philosophy of science, which can be generally divided into two main parts: A) Syntactic (linguistic) approach, which is also called the traditional or received view, B) Semantic approach (model-based view).The main and dominant idea in the syntactic view, first proposed by logical empricists is that a scientific theory is an axiomatic system in a formal language (first-order language) that is subject to deduction and is partially interpreted by a set of correspondence rules. This view has been the subject of numerous attacks and criticisms since the 1960s, after which an alternative approach,...

محتواي کتاب

Unsupervised induction of persian semantic verb classes based on syntactic information

, Article Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Warsaw ; Volume 7912 LNCS , June , 2013 , Pages 112-124 ; 03029743 (ISSN) ; 9783642386336 (ISBN) Aminian, M ; Rasooli, M. S ; Sameti, H ; Sharif University of Technology

2013

Abstract

Automatic induction of semantic verb classes is one of the most challenging tasks in computational lexical semantics with a wide variety of applications in natural language processing. The large number of Persian speakers and the lack of such semantic classes for Persian verbs have motivated us to use unsupervised algorithms for Persian verb clustering. In this paper, we have done experiments on inducing the semantic classes of Persian verbs based on Levin's theory for verb classes. Syntactic information extracted from dependency trees is used as base features for clustering the verbs. Since there has been no manual classification of Persian verbs prior to this paper, we have prepared a...

Persian word embedding evaluation benchmarks

, Article 26th Iranian Conference on Electrical Engineering, ICEE 2018, 8 May 2018 through 10 May 2018 ; 2018 , Pages 1583-1588 ; 9781538649169 (ISBN) Zahedi, M. S ; Bokaei, M. H ; Shoeleh, F ; Yadollahi, M. M ; Doostmohammadi, E ; Farhoodi, M ; Sharif University of Technology

Institute of Electrical and Electronics Engineers Inc 2018

Abstract

Recently, there has been renewed interest in semantic word representation also called word embedding, in a wide variety of natural language processing tasks requiring sophisticated semantic and syntactic information. The quality of word embedding methods is usually evaluated based on English language benchmarks. Nevertheless, only a few studies analyze word embedding for low resource languages such as Persian. In this paper, we perform such an extensive word embedding evaluation in Persian language based on a set of lexical semantics tasks named analogy, concept categorization, and word semantic relatedness. For these evaluation tasks, we provide three benchmark data sets to show the...

Fault effects in FlexRay-based networks with hybrid topology

, Article ARES 2008 - 3rd International Conference on Availability, Security, and Reliability, Proceedings, 4 March 2008 through 7 March 2008, Barcelona ; 2008 , Pages 491-496 ; 0769531024 (ISBN); 9780769531021 (ISBN) Dehbashi, M ; Lari, V ; Miremadi, S. G ; Shokrollah Shirazi, M ; Sharif University of Technology

2008

Abstract

This paper investigates fault effects and error propagation in a FlexRay-based network with hybrid topology that includes a bus subnetwork and a star subnetwork "The investigation is based on about 43500 bit-flip fault injection inside different parts of the FlexRay communication controller. To do this, a FlexRay communication controller is modeled by Verilog HDL at the behavioral level. Then, this controller is exploited to setup a FlexRay-based network composed of eight nodes (four nodes in the bus subnetwork and four nodes in the star subnetwork). The faults are injected in a node of the bus subnetwork and a node of the star subnetwork of the hybrid network. Then, the faults resulting in...

Using syntactic-based kernels for classifying temporal relations

, Article Journal of Computer Science and Technology ; Volume 26, Issue 1 , 2010 , Pages 68-80 ; 10009000 (ISSN) Mirroshandel, S. A ; Ghassem Sani, G ; Khayyamian, M ; Sharif University of Technology

Abstract

Temporal relation classification is one of contemporary demanding tasks of natural language processing. This task can be used in various applications such as question answering, summarization, and language specific information retrieval. In this paper, we propose an improved algorithm for classifying temporal relations, between events or between events and time, using support vector machines (SVM). Along with gold-standard corpus features, the proposed method aims at exploiting some useful automatically generated syntactic features to improve the accuracy of classification. Accordingly, a number of novel kernel functions are introduced and evaluated. Our evaluations clearly demonstrate that...

Using tree kernels for classifying temporal relations between events

, Article PACLIC 23 - Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, 3 December 2009 through 5 December 2009 ; Volume 1 , 2009 , Pages 355-364 ; 9789624423198 (ISBN) Mirroshandel, S. A ; Ghassem Sani, G. R ; Khayyamian, M ; Sharif University of Technology

Abstract

The ability to accurately classify temporal relations between events is an important task in a large number of natural language processing and text mining applications such as question answering, summarization, and language specific information retrieval. In this paper, we propose an improved way of classifying temporal relations, using support vector machines (SVM). Along with gold-standard corpus features, the proposed method aims at exploiting useful syntactic features, which are automatically generated, to improve accuracy of the SVM classification method. Accordingly, a number of novel kernel functions are introduced and evaluated for temporal relation classification. Our evaluations...

Automatic Labeling of Prosody in Persian Unmarked Speech

, M.Sc. Thesis Sharif University of Technology Jamshidlou, Paria (Author) ; Eslami, Moharram (Supervisor) ; Bahrani, Mohammad (Co-Advisor)

Abstract

Prosodic annotations are used for locating and characterizing prominent parts in utterances as well as identifying and describing boundaries of coherent stretches of speech. Automatic detection and labeling of prosodic events in speech has received much attention in recent years since prosody is intricately bound to the semantics of the utterance. Recognition of prosodic events is important for spoken language applications such as automatic understanding and translation of speech. Moreover, corpora labeled with prosodic markers are essential for building speech synthesizers that use data-driven approaches to generate natural speech. Such databases are important to reach a better...

محتواي کتاب

Advantages of dependency parsing for free word order natural languages

, Article 41st International Conference on Current Trends in Theory and Practice of Computer Science, SOFSEM 2015, 24 January 2015 through 29 January 2015 ; Volume 8939 , 2015 , Pages 511-518 ; 03029743 (ISSN) ; 9783662460771 (ISBN) Mirlohi Falavarjani, S. A ; Ghassem Sani, G ; Sharif University of Technology

Springer Verlag 2015

Abstract

An important reason to prefer dependency parsing over classical phrased based methods, especially for languages such as Persian, with the property of being “free word order”, is that this particular property has a negative impact on the accuracy of conventional parsing methods. In Persian, some words such as adverbs can freely be moved within a sentence without affecting its correctness or meaning. In this paper, we illustrate the robustness of dependency parsing against this particular problem by training two well-known dependency parsers, namely MST Parser and Malt Parser, using a Persian dependency corpus called Dadegan. We divided the corpus into two separate parts including only...

Automatic extraction of is-a relations in taxonomy learning

, Article 13th International Computer Society of Iran Computer Conference on Advances in Computer Science and Engineering, CSICC 2008, Kish Island, 9 March 2008 through 11 March 2008 ; Volume 6 CCIS , 2008 , Pages 17-24 ; 18650929 (ISSN); 3540899847 (ISBN); 9783540899846 (ISBN) Neshati, M ; Abolhassani, H ; Fatemi, H ; Sharif University of Technology

2008

Abstract

Taxonomy learning is a prerequisite step for ontology learning. In order to create a taxonomy, first of all, existing 'is-a' relations between words should be extracted. A known way to extract 'is-a' relations is finding lexicosyntactic patterns in large text corpus. Although this approach produces results with high precision but it suffers from low values of recall. Furthermore developing a comprehensive set of patterns is tedious and cumbersome. In this paper, firstly, we introduce an approach for developing lexico-syntactic patterns automatically using the snippets of search engine results and then, challenge the law recall of this approach using a combined model, which is based on...

Question Processing for Open Domain Persian Question Answering Systems

, M.Sc. Thesis Sharif University of Technology Hosseini, Hawre (Author) ; Bahrani, Mohammad (Supervisor)

Abstract

Question answering systems are systems which get a question in natural language as input and present an explicit, appropriate answer to the question. One of the major components of automatic question answering systems is question processing component in which the input question is analyzed. The main goal of question processing phase is to determine the answer type through question classification. Rule-based, machine learning-based and hybrid approaches have been used in order to develop question classifiers among which machine learning-based ones have outperformed the others. This study’s main goal is to develop a question classifier for Persian open domain question answering systems....

محتواي کتاب

PostRank: A new algorithm for incremental finding of persian blog representative words

, Article ACM International Conference Proceeding Series ; 2012 ; 9781450309158 (ISBN) Sayyadiharikandeh, M ; Ghodsi, M ; Naghibi, M ; Sharif University of Technology

2012

Abstract

Dimension reduction techniques for text documents can be used for in the preprocessing phrase of blog mining, but these techniques can be more effective if they deal with the nature of the blogs properly. In this paper we propose a novel algorithm called PostRank using shallow approach to identify theme of the blog or blog representative words in order to reduce the dimensions of blogs. PostRank uses a graph-based syntactic representation of the weblog by taking into account some structural features of weblog. At the first step it models the blog as a complete graph and assumes the theme of the blog as a query applied to a search engine like Google and each post as a search result. It tries...

Classification of activated faults in the flexray-based networks

, Article Journal of Electronic Testing: Theory and Applications (JETTA) ; Volume 26, Issue 5 , October , 2010 , Pages 535-547 ; 09238174 (ISSN) Sedaghat, Y ; Miremadi, S. G ; Sharif University of Technology

Abstract

FlexRay communication protocol is expected to become the de-facto standard for distributed safety-critical systems. This paper classifies the effects of transient single bit-flip fault injections into the FlexRay communication controller. In this protocol, when an injected fault is activated, this may result in one or more error types, i.e.: Boundary violation, Conflict, Content, Freeze, Synchronization, Syntax, and Invalid frame. To study the activated faults, a FlexRay bus network, composed of four nodes, was modeled by Verilog HDL; and a total of 135,600 transient faults was injected in only one node, called the target node. The results show that only 9,342 of the faults (about 6.9%) were...

Categorizing and analysis of activated faults in the flexray communication controller registers

, Article Proceedings of the 14th IEEE European Test Symposium, ETS 2009, 25 May 2009 through 29 May 2009, Sevilla ; 2009 , Pages 121-126 ; 9780769537030 (ISBN) Sedaghat, Y ; Miremadi, G ; Sharif University of Technology

2009

Abstract

FlexRay communication protocol is expected becoming the de-facto standard for distributed safetycritical systems. In this paper, transient single bit-flip faults were injected into the FlexRay communication controller to categorize and analyze the activatedfaults. In this protocol, an activated fault results in one or more error types which are Boundary violation, Conflict, Content, Freeze, Synchronization, and Syntax. To study the activated faults, a FlexRay bus network, composed of four nodes, was modeled by Verilog HDL; and a total of 135,600 transient faults were injected in only one node, where 9,342 (6.9%) of the faults were activated. The results show that the Synchronization error is...