Loading...
- Type of Document: M.Sc. Thesis
- Language: Farsi
- Document No: 53477 (19)
- University: Sharif University of Technology
- Department: Computer Engineering
- Advisor(s): Soleymani Baghshah, Mahdieh; Karbalaei Aghajan, Hamid
- Abstract:
- There are currently about 50 million people with Alzheimer's disease in the world, and this number is about 700 thousand in Iran. The symptoms of the disease include decreased awareness, disinterest in unfamiliar subjects, increased distraction, speech problems, and etc. which gradually leads to an absolute inability to perform daily activities and completely mute. The disease belongs to the category of neurological disorders and is the most common type of dementia for which no treatment has been offered so far. However, if the disease is diagnosed in its early stage, a series of pharmacological and behavioral therapy approaches can be prescribed to reduce the pace or progression of the disease symptoms. All indicate the importance of investigating this disease and its early diagnosis. The aim of this study is to use the power of deep neural networks in the field of speech and text processing to diagnose AD from a targeted speech such as the picture description cognitive assessment. The most challenging problem in developing technique for recognizing AD patients from speech is the lack of a large dataset. Currently, the largest available dataset is the Pitt corpus from the DementiaBank dataset, which contains 500 picture description interviews from the AD and control groups. Therefore, in the present study, the techniques of transfer learning and representation learning have been extremely used. The purpose of this idea is that the models have a good knowledge of the structure of language and its common features before performing AD diagnosis. Transformer-based pre-trained deep language models have recently made a large leap in natural language processing research and application. These models are pre-trained on available large datasets to understand natural language texts appropriately, and are shown to subsequently perform well on classification tasks with small training sets. Representation learning models have also achieved significant results in improving speech recognition task. In this study, using a combination of these methods, in addition to improving the accuracy of AD detection using speech, we try to reduce the need for hand-crafted expert-defined features for this problem. The models are evaluated on picture description test transcripts of the Pitt corpus, which contains data of 170 AD patients with 257 interviews and 99 healthy controls with 243 interviews. The best textual model of this research (pre-trained large bidirectional encoder representations from transformers embedding or, equivalently, BERTLarge with logistic regression classifier) achieves classification accuracy of 88.08%, which improves the state-of-the-art by 2.48%. Also, by combining this textual model with the acoustic model based on the pre-trained Wav2Vec model (which is designed to perform speech recognition using representation learning), 89.01% classification accuracy is achieved, which improves state-of-the-art by 3.41%. The proposed methods, in addition to improve AD prediction, do not need expert-defined features. Also, with the multilingual versions of these models and domain adaptation techniques, the knowledge of AD prediction in one language can be transferred to another language in which a sufficiently large dataset does not exist
- Keywords:
- Alzheimer ; Early Detection ; Image Captioning ; Deep Learning ; Representation Learning ; Transfer Learning ; Speech Processing ; Language Model ; Natural Language Processing ; Transducer
- محتواي کتاب
- view
- 1 مقدمه
- 2 مبانی
- 3 پژوهشهای پیشین
- 4 راهکار پیشنهادی
- 5 پیاده سازی، آزمایش ها و ارزیابی
- 6 جمعبندی و کارهای آتی
- مراجع
- واژهنامه انگلیسی به فارسی
- واژهنامه فارسی به انگلیسی