An Efficient Network for Real-Time Semantic Segmentation

Ghafouri, Masoud; Kasaei, Shohreh

Please enable javascript in your browser.

An Efficient Network for Real-Time Semantic Segmentation

Ghafouri, Masoud | 2024

0 Viewed

Type of Document: M.Sc. Thesis
Language: Farsi
Document No: 57394 (19)
University: Sharif University of Technology
Department: Computer Engineering
Advisor(s): Kasaei, Shohreh
Abstract:
In real-time semantic segmentation, images are divided into predefined semantic regions at a speed exceeding 30 frames per second, and each point of the image is labeled with a category. The output of this fundamental task is to produce a representation that is comprehensible to computers for further processing in applications such as autonomous vehicles, robotics, and augmented reality. This task pursues two conflicting goals: improving the accuracy of segmentation and increasing its speed. Previous research has enhanced segmentation speed using lightweight and multi-branch structures and improved accuracy using attention mechanisms. Depth map alongside color image can lead to better semantic segmentation performance but is not used in real-time semantic segmentation networks due to the increase in computational volume and the decrease in inference speed. This research proposes a solution to improve semantic segmentation performance with depth map information while maintaining its speed. During the training of the BiSeNet v2 network as the base network in the proposed method, the auxiliary ResNet network is used to combine the feature maps extracted from both color image and depth map, resulting in more accurate semantic segmentation. Through self-distillation, the learned knowledge is transferred to BiSeNet v2, enhancing its ability to extract semantic relationships. During inference, semantic segmentation with higher accuracy is performed relying solely on color image data. The proposed method requires the presence of depth map or stereo data for the training phase. In this research, the Cityscapes dataset is used, which includes stereo data for estimating depth map for network training. The proposed method on this dataset was able to increase the accuracy of BiSeNet v2 from 74.91 % to 77.72 % in the mean intersection over union metric while maintaining the previous processing speed of 95 frames per second
Keywords:
Semantic Segmentation ; Image Segmentation ; Deep Networks ; Computer Vision ; Knowledge Distillation ; Self-Distillation ; Depth Map ; Real-Time Semantic Segmentation

Digital Object List

محتواي کتاب
view

Bookmark

No TOC

Friend's email
Your name
Your email
enter code