Stochastic data-to-text generation using syntactic dependency information

Seifossadat, E; Sameti, H Sharif University of Technology

Please enable javascript in your browser.

Stochastic data-to-text generation using syntactic dependency information

Seifossadat, E ; Sharif University of Technology | 2022

125 Viewed

Type of Document: Article
DOI: 10.1016/j.csl.2022.101388
Publisher: Academic Press , 2022
Abstract:
Data-to-Text Generation (D2T) is one of the most important sub-fields of Natural Language Generation where structured data is transcribed into natural language text. Several solutions have been proposed for D2T so far with relative success, including template-based, phrase structure grammar-based, and neural attention models. However, these methods also have problems such as grammatical flaws, limited naturalness, and semantic deficiencies. In this work, we propose a stochastic corpus-based model for the data-to-text generation that produces a tree-form structure for sentences based on dependency information. This information includes the dependency relations between words and meaning labels extracted from the aligned training sentences parsed with a dependency parser. By combining the dependency relations and meaning labels to construct a tree structure in an up-down manner, each word is placed into the output sentence based on its preceding and succeeding words. This results in fluent sentences with correct grammatical structures. This approach also ensures that all required semantic information are present in the output sentences while irrelevant or redundant labels are avoided. In addition, by using beam search in producing the structure of sentences, the proposed model can generate highly diverse sentences. We test our model on eight domains in tabular, dialogue act, and RDF formats. Our model improves the BLEU by 30% compared to the corpus-based state-of-the-art methods trained on the tabular datasets and also achieves comparable results with the neural network-based approaches trained on dialogue act, E2E, and WebNLG datasets in the BLEU evaluation metric. Furthermore, the value of ERR metric for our results is always zero; that means our model generates sentences without losing any information. Human evaluations show that our model produces high-quality utterances in aspects of informativeness and naturalness as well as quality. © 2022
Keywords:
Data-to-text Generation ; Natural Language Generation ; Syntactic Dependency ; Computational grammars ; Forestry ; Natural language processing systems ; Quality control ; Semantic Web ; Semantics ; Stochastic models ; Syntactics ; Trees (mathematics) ; Corpus-based ; Data-to-text generation ; Dependency informations ; Dependency relation ; Dialog acts ; Natural language generation ; Sentence-based ; Stochastic data ; Syntactic dependencies ; Text generations ; Stochastic systems
Source: Computer Speech and Language ; Volume 76 , 2022 ; 08852308 (ISSN)
URL: https://www.sciencedirect.com/science/article/abs/pii/S0885230822000274

Friend's email
Your name
Your email
enter code