Deep Semi-Supervised Text Classification

Karimi, Ali | 2021

62 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 54142 (19)
  4. University: Sharif University of Technology
  5. Department: Computer Engineering
  6. Advisor(s): Semati, Hossein
  7. Abstract:
  8. Large data sources labeled by experts at cost are essential for deep learning success in various domains. But, when labeling is expensive and labeled data is scarce, deep learning generally does not perform well. The goal of semi-supervised learning is to leverage abundant unlabeled data that one can easily collect. New semi-supervised algorithms based on data augmentation techniques have reached new advances in this field. In this work, by studying different textual augmentation techniques, a new approach is proposed that can obtain effective information signals from unlabeled data. The method encourages the model to generate the same representation vectors for different augmented versions of a text. The model accuracy is boosted by self-training on unlabeled data, simultaneously. The proposed method obtains state-of-the-art results across four text classification dataset. For example, in SST-2 dataset, while the fully supervised model by using all labeled data (68000 example) achieved 86.9\% accuracy, the proposed method with only 500 labeled examples achieved 78.2\% accuracy. Experiments demonstrate that the proposed method is scalable both in terms of increasing the size of the labeled set and, in terms of increasing the size of the model
  9. Keywords:
  10. Semi-Supervised Learning ; Text Classification ; Deep Learning ; Contrastive Learning ; Self-Training ; Textual Data Augmentation

 Digital Object List


...see more