Loading...
Resources and Language Processing Models for Hadith and Hadith/Quranic/Biblical Relation Analyses
Jahanmir Yazdi, Mohammad Aref | 2024
0
Viewed
- Type of Document: M.Sc. Thesis
- Language: Farsi
- Document No: 57610 (19)
- University: Sharif University of Technology
- Department: Computer Engineering
- Advisor(s): Asgari, Ehsanoddin
- Abstract:
- Hadith, as one of the most important sources of classical Arabic text, presents the sayings and the conduct of Prophet Muhammad (PBUH) and his family (A.S) as narratives of life and religious teachings. In this context, Hadith processing includes tasks such as sorting, extraction, linking, categorization, and alignment of information from hadith texts, which assist in easy access to information and better understanding of the meanings of the hadiths. The absence of a comprehensive database of hadiths presents a challenge, making access to information diffcult and comparison across different works impossible. Preprocessing of hadith texts faces challenges like information extraction (such as Quranic data and the names of the infallibles), alignment of translations, labeling, and precise structuring, all of which affect the accuracy and credibility of text processing. Developing suitable computational and processing models for hadith data can improve the accuracy and effciency of hadith textual analysis. Hadiths, as supplementary sources to the Quran and other Islamic sacred texts, contain significant connections and references that text processing can effectively interpret, providing a foundation for deeper studies of Islam. The work carried out in this research is divided into four categories: collection and preprocessing of hadith data, development of a language model specific to the hadith domain, establishment of initial infrastructure, and efforts to enhance conceptual understanding. In the first category, attention has been given to the collection of hadith data and its initial preprocessing. The activities undertaken include: (1) examining and designing a study-friendly structure for hadiths; (2) creating a comprehensive database with over 560,000 hadiths from both Shia and Sunni sources; (3) grouping hadiths to create a more suitable environment for searching them. The second category focuses on the development of a language model based on transformers. In this section, (4) a multilingual model has been trained to understand the structure and meaning of hadiths. This model has achieved a word prediction accuracy of 52ff for unvoweled data and 59ff for voweled data, using the Mean Reciprocal Rank metric for evaluation. In the third category, attention has been given to the development of an expandable initial infrastructure for models based on artificial neural networks. The activities undertaken include: (5) separating the chain of narration (isnad) from the hadith text, achieving an accuracy of 89ff in predicting the position of the isnad using the F1 score metric; (6) diacriticizing the hadith text, achieving an accuracy of 92ff in predicting diacritics for each character; (7) tagging the names of the Imams in the hadiths, achieving an accuracy of 89ff in word prediction; (8) segmenting the words in the hadith text, achieving an accuracy of 95ff in determining the position of each character in a word; (9) examining text segmentation and aligning the translation with each specific segment to ensure the hadith data is conceptually integrated into the database. Finally, the fourth category is dedicated to efforts to enhance conceptual understanding. These efforts include: (10) providing semantic search capabilities to facilitate the study and search of hadiths; (11) aggregating Quranic data and creating intertextual links with hadiths, thereby enabling a deeper understanding of the texts
- Keywords:
- Text Processing ; Transformer-based Language Models ; Quran’s Language ; Computational Humanities ; Hadith/Quranic/Biblical Field ; Sorting ; Text Extraction