Supervised Semantic Segmentation of RGB-Depth Images

Fooladgar, Fahimeh; Kasaei, Shohreh

Please enable javascript in your browser.

Supervised Semantic Segmentation of RGB-Depth Images

Fooladgar, Fahimeh | 2020

681 Viewed

Type of Document: Ph.D. Dissertation
Language: Farsi
Document No: 53435 (19)
University: Sharif University of Technology
Department: Computer Engineering
Advisor(s): Kasaei, Shohreh
Abstract:
The labeling process is one of the most important tasks in the field of computer vision. The dense labeling problem is the main step towards 2D and 3D scene understanding. The main goal of dense labeling is to label all pixels of images that are known as a semantic segmentation of images in the related literature. Although the state-of-the-art results are mainly achieved by deep learning methods, traditional methods had also been at the center of attention for some years. In the last decades, convolutional neural networks have changed the landscape of visual recognition tasks such as labeling and semantic segmentation. The most important issues in deep learning models are the hardware and the burden of computational cost. Hence, they are two important constraints for limited-resource devices (such as embedded and mobile devices). Recently, some architectures have been proposed to overcome these limitations by considering specific hardware-software equipment. In this thesis, two methods have been proposed in two categories of designing extremely efficient neural networks and compressing the model by distillation policies to decrease the computational cost and memory requirements at the inference time. In the first proposed method, the residual densely connected blocks are proposed to guaranty the deep supervision, efficient gradient flow, and feature reuse abilities of the convolutional neural network. The proposed method decreases the cost of training and inference processes without using any special hardware-software equipment by just reducing the number of parameters and computational operations while achieving a feasible accuracy. Extensive experimental results demonstrate the effectiveness of these residual dense connections in the network architecture. These results reveal that the proposed architecture is more efficient than the AlexNet and VGGNet in terms of model size, required parameters, and even accuracy. After that, an efficient encoder-decoder model with the attention-based fusion block is proposed to integrate mutual influences between feature maps of these two modalities. This block explicitly extracts the interdependencies among concatenated feature maps of these modalities to exploit more powerful feature maps from RGB-Depth images. In the second proposed method, the knowledge distillation has been performed within the network from the deeper level into the shallower level by the idea of ensemble and adversarial learning. The main goals are not only the overall performance of the ensemble models improve by aggregating their predictions but also these sub-neural network models learn from each other by transferring their knowledge among themselves. The extensive experimental results on the main challenging datasets show that the proposed network outperforms the state-of-the-art models with respect to computational cost as well as model size. Experimental results also illustrate the effectiveness of the proposed models in terms of computational cost and accuracy
Keywords:
Semantic Segmentation ; Efficient Neural Network ; RGB-D Camera ; Knowledge Distillation ; Attention-Based Fusion ; Labeling Algorithm

Digital Object List

محتواي کتاب
view

Bookmark

Friend's email
Your name
Your email
enter code