Loading...

BEDSpell: spelling error correction using bert-based masked language model and edit distance

Tohidian, F ; Sharif University of Technology | 2023

0 Viewed
  1. Type of Document: Article
  2. DOI: 10.1007/978-3-031-26507-5_1
  3. Publisher: Springer Science and Business Media Deutschland GmbH , 2023
  4. Abstract:
  5. The spelling correction problem, the task of automatically correcting misspellings in a text, is critical in natural language processing (NLP). Although it can be considered a standalone task, in most cases, it is an integral component of various NLP tasks as a preprocessing step since a dataset with typos can lead to erroneous results. Many previous automatic spelling correctors use a dictionary, independently search the word in a predefined list of words, and recommend the most similar one without considering the context. Even though these models’ output may be a correctly spelled word, it could be semantically incorrect. Therefore, some correctors consider the context when correcting typos based on language models. However, only employing the language model is insufficient, and the corrected word should be similar to the misspelled word. In our approach, we select a candidate for the typo based on masked language model output, character-level similarities, and edit distance. Exploiting the combination of the masked language model, character-level similarities, and edit distance assists us in recommending similar context-related candidates. We have used recall (correction rate) as our evaluation metric, and the results demonstrate a considerable improvement compared with previous studies. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG
  6. Keywords:
  7. Dictionary ; Edit distance ; Masked language model ; Natural language processing ; Preprocessing ; Spelling correction
  8. Source: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) ; Volume 13821 LNCS , 2023 , Pages 3-14 ; 03029743 (ISSN); 978-303126506-8 (ISBN)
  9. URL: https://link.springer.com/chapter/10.1007/978-3-031-26507-5_1