Loading...
- Type of Document: M.Sc. Thesis
- Language: Farsi
- Document No: 56097 (19)
- University: Sharif University of Technology
- Department: Computer Engineering
- Advisor(s): Fazli, Mohammad Amin
- Abstract:
- Deep learning algorithms combined with supervision rely heavily on labeled data, posing challenges in the data labeling process. Addressing this issue, researchers in the field of machine learning have focused on developing approaches to reduce the dependency on labeled data and improve the efficiency of data collection for labeling purposes. This thesis investigates the training of a classification model using data collected through a human-in-the-loop system. Notably, this research pioneers the application of active learning techniques to differentiate between political and non-political Persian tweets. The dataset introduced in this study is the sole available collection for this specific task in the Persian language. The research evaluates and compares five active learning methods, namely least confidence active learning, margin confidence active learning, maximum entropy active learning, active learning with contrastive examples, and active learning with discrepancy in the committee. These methods are employed to collect unlabeled samples for labeling purposes using the dataset introduced in this research. The findings of this study offer insights into more optimal data collection techniques for this task, contributing to the advancement of active learning in the Persian language. Moreover, the research highlights significant research gaps in the field of active learning by evaluating traditional and modern active learning methods on datasets with distributions that differ from those used in previous studies
- Keywords:
- Active Learning ; Human in the Loop ; Political Tweet ; Annotation Budget Limit ; Supervised Learning ; Deep Learning