Loading...
- Type of Document: M.Sc. Thesis
- Language: Farsi
- Document No: 53477 (19)
- University: Sharif University of Technology
- Department: Computer Engineering
- Advisor(s): Soleymani Baghshah, Mahdieh; Karbalaei Aghajan, Hamid
- Abstract:
- There are currently about 50 million people with Alzheimer's disease in the world, and this number is about 700 thousand in Iran. The symptoms of the disease include decreased awareness, disinterest in unfamiliar subjects, increased distraction, speech problems, and etc. which gradually leads to an absolute inability to perform daily activities and completely mute. The disease belongs to the category of neurological disorders and is the most common type of dementia for which no treatment has been offered so far. However, if the disease is diagnosed in its early stage, a series of pharmacological and behavioral therapy approaches can be prescribed to reduce the pace or progression of the disease symptoms. All indicate the importance of investigating this disease and its early diagnosis. The aim of this study is to use the power of deep neural networks in the field of speech and text processing to diagnose AD from a targeted speech such as the picture description cognitive assessment. The most challenging problem in developing technique for recognizing AD patients from speech is the lack of a large dataset. Currently, the largest available dataset is the Pitt corpus from the DementiaBank dataset, which contains 500 picture description interviews from the AD and control groups. Therefore, in the present study, the techniques of transfer learning and representation learning have been extremely used. The purpose of this idea is that the models have a good knowledge of the structure of language and its common features before performing AD diagnosis. Transformer-based pre-trained deep language models have recently made a large leap in natural language processing research and application. These models are pre-trained on available large datasets to understand natural language texts appropriately, and are shown to subsequently perform well on classification tasks with small training sets. Representation learning models have also achieved significant results in improving speech recognition task. In this study, using a combination of these methods, in addition to improving the accuracy of AD detection using speech, we try to reduce the need for hand-crafted expert-defined features for this problem. The models are evaluated on picture description test transcripts of the Pitt corpus, which contains data of 170 AD patients with 257 interviews and 99 healthy controls with 243 interviews. The best textual model of this research (pre-trained large bidirectional encoder representations from transformers embedding or, equivalently, BERTLarge with logistic regression classifier) achieves classification accuracy of 88.08%, which improves the state-of-the-art by 2.48%. Also, by combining this textual model with the acoustic model based on the pre-trained Wav2Vec model (which is designed to perform speech recognition using representation learning), 89.01% classification accuracy is achieved, which improves state-of-the-art by 3.41%. The proposed methods, in addition to improve AD prediction, do not need expert-defined features. Also, with the multilingual versions of these models and domain adaptation techniques, the knowledge of AD prediction in one language can be transferred to another language in which a sufficiently large dataset does not exist
- Keywords:
- Alzheimer ; Early Detection ; Image Captioning ; Deep Learning ; Representation Learning ; Transfer Learning ; Speech Processing ; Language Model ; Natural Language Processing ; Transducer
-
محتواي کتاب
- view
- 1 مقدمه
- 1-1 تعریف مساله
- 1-2 اهمیت و کاربرد
- 1-3 رویکردهای دستهبندی گفتار
- 1-3-1 دستهبندی مبتنی بر ویژگیهای دستساز
- 1-3-2 دستهبندی مبتنی بر گفتار خام
- 1-4 رویکردهای بهبود دستهبندی
- 1-4-1 [fa]dataaugmentation
- 1-4-2 [fa]transferlearning
- 1-5 چالشها
- 1-5-1 عدم وجود مجموعهی دادگان کافی
- 1-5-2 عدم وجود مدلهای صوتی از پیش آموزش دادهشدهی مناسب
- 1-6 هدف پژوهش
- 1-7 ساختار پایاننامه
- 2 مبانی
- 2-1 مقدمه
- 2-2 نحوهی ارائهی متن به عنوان ورودی مدلهای محاسباتی
- 2-2-1 [fa]wordembedding
- 2-2-2 [fa]wordsegmentation
- 2-3 مدلهای زبانی متنی
- 2-3-1 مدلهای زبانی مبتنی بر [fa]rnns
- 2-3-2 مدلهای زبانی مبتنی بر [fa]transformer
- 2-4 نحوهی ارائهی صوت به عنوان ورودی مدلهای محاسباتی
- 2-4-1 صوت در مقیاس زمان
- 2-4-2 صوت به شکل [fa]mfcc
- 2-5 [fa]representationlearning صوت
- 2-5-1 [fa]nce
- 2-5-2 [fa]cpc
- 2-5-3 Wav2Vec
- 2-6 جمعبندی
- 3 پژوهشهای پیشین
- 3-1 مقدمه
- 3-2 آزمونهای شناختی تشخیص بیماری آلزایمر
- 3-2-1 [fa]mmse
- 3-2-2 آزمون [fa]moca
- 3-2-3 آزمون [fa]verbalfluency
- 3-2-4 آزمون [fa]storyrecall
- 3-2-5 آزمون [fa]sentenceconstruction
- 3-2-6 آزمون [fa]picturedescription
- 3-3 کارهای پیشین مبتنی بر ویژگی
- 3-4 کارهای پیشین مبتنی بر یادگیری عمیق
- 3-5 جمعبندی
- 4 راهکار پیشنهادی
- 4-1 مقدمه
- 4-2 مدلهای پیشنهادی متنی
- 4-2-1 [fa]dataaugmentation
- 4-2-2 انتقال یادگیری با استفاده از مدلهای زبانی عمیق
- 4-3 مدلهای پیشنهادی صوتی
- 4-3-1 تقسیمبندی صوت و [fa]mil
- 4-3-2 استفاده از [fa]mfcc
- 4-3-3 [fa]transferlearning با استفاده از دستهبندهای صوتی از پیش آموزش دادهشده
- 4-3-4 [fa]transferlearning با استفاده از [fa]representationlearning
- 4-4 ترکیب مدلهای متنی و صوتی
- 4-5 انتقال یادگیری بین مدلهای متنی در زبانهای مختلف
- 4-5-1 [fa]domainadaptationی [fa]embedding جملات دو زبان
- 4-6 جمعبندی
- 5 پیاده سازی، آزمایش ها و ارزیابی
- 5-1 مقدمه
- 5-2 دادگان آموزشی
- 5-2-1 [fa]pittcorpus از مجموعهی دادگان [fa]dementiabank
- 5-2-2 [fa]lucorpus از مجموعهی دادگان [fa]dementiabank
- 5-2-3 مجموعهی دادگان [fa]xnli
- 5-2-4 جمعآوری مجموعهی دادگان فارسی
- 5-3 معیارهای ارزیابی
- 5-4 مدلهای متنی
- 5-4-1 پیادهسازی
- 5-4-2 روشهای مقایسهشده
- 5-4-3 نتایج
- 5-4-4 تحلیل نتایج
- 5-5 مدلهای صوتی
- 5-5-1 پیادهسازی
- 5-5-2 نتایج
- 5-5-3 تحلیل نتایج
- 5-6 ترکیب مدلهای متنی و صوتی
- 5-6-1 پیادهسازی
- 5-6-2 نتایج
- 5-6-3 تحلیل نتایج
- 5-7 انتقال دانش مدلهای متنی برای تشخیص بیماری آلزایمر در زبانهای دیگر
- 5-7-1 پیادهسازی
- 5-7-2 نتایج
- 5-7-3 تحلیل نتایج
- 5-8 جمعبندی
- 6 جمعبندی و کارهای آتی
- 6-1 نتیجهگیری
- 6-2 مزایا و محدودیتها
- 6-3 کارهای آتی
- مراجع
- واژهنامه انگلیسی به فارسی
- واژهنامه فارسی به انگلیسی