Loading...

Weakly Supervised Semantic Segmentation Using Deep Neural Network

Ahmadi, Rozhan | 2023

0 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 57033 (19)
  4. University: Sharif University of Technology
  5. Department: Computer Engineering
  6. Advisor(s): Kasaei, Shohreh
  7. Abstract:
  8. Semantic segmentation in computer vision plays a vital role in teaching machines to mimic how humans interpret visual information. In this area, common fully supervised methods face the challenge of manual and time-consuming pixel-level annotations of large datasets. This volume of annotation also places a heavy processing load on the hardware. In recent years, weak supervision has been introduced to address these challenges. In this method, the supervision of the model is done by weak labels, which are more available compared to pixel-wise annotations of an image. This efficiency in the annotation cost and processing load, along with high accuracy, has brought weakly supervised semantic segmentation extensive applications in the field of autonomous vehicles and satellite image processing. Weak labels, due to their lack of spatial information of objects, face challenges such as cross object discrimination and complete object mask generation; and existing solutions have tried to solve these challenges by focusing on the generation of localization maps. On the other hand, recent research on weakly supervised semantic segmentation has shown that vision transformers have great potential to achieve significant improvements. However, the use of hierarchical vision transformers, despite outstanding performance in other visual tasks due to local to global vision, has not yet been studied in weakly supervised semantic segmentation. For this purpose, this research introduces a new architecture called SWTformer in two versions. SWTformer-V1 is designed to adapt the Swin hierarchical vision transformer, to weakly supervised semantic segmentation and generating localization maps from image classification through image level labels. Based on experiments conducted on the benchmark dataset PASCAL VOC 2012, this architecture gains 0.98% higher accuracy on the mAP metric, outperforming State-of-the-Art models. In addition, the generation of localization maps, compared to other methods based on a single classification network, performs about 0.82% mIoU higher on average than other methods. In SWTformer-V2, a multi-scale feature fusion mechanism has been implemented to extract further information and a background-aware mechanism has been used to generate more complete localization maps with more accurate cross-object discrimination. These proposed methods have improved the accuracy of localization maps, generated in SWTformer-V1, by about 5.32% mIoU, showing the effectiveness of these methods in further adapting the Swin model to the task of weakly supervised semantic segmentation
  9. Keywords:
  10. Computer Vision ; Semantic Segmentation ; Deep Learning ; Weakly Supervised Learning ; Images Classification ; Class Activation Map ; Hierarchical Vision Transformer

 Digital Object List

 Bookmark

...see more