Loading...
3M2RNet: Multi-modal multi-resolution refinement network for semantic segmentation
Fooladgar, F ; Sharif University of Technology | 2020
504
Viewed
- Type of Document: Article
- DOI: 10.1007/978-3-030-17798-0_44
- Publisher: Springer Verlag , 2020
- Abstract:
- One of the most important steps towards 3D scene understanding is the semantic segmentation of images. The 3D scene understanding is considered as the crucial requirement in computer vision and robotic applications. With the availability of RGB-D cameras, it is desired to improve the accuracy of the scene understanding process by exploiting the depth along with appearance features. One of the main problems in RGB-D semantic segmentation is how to fuse or combine these two modalities to achieve more advantages of the common and specific features of each modality. Recently, the methods that encounter deep convolutional neural networks have reached the state-of-the-art results in dense prediction. They are usually used as feature extractors as well as data classifiers with an end-to-end training procedure. In this paper, an efficient multi-modal multi-resolution refinement network is proposed to exploit the advantages of these modalities (RGB and depth) as much as possible. This refinement network is a type of encoder-decoder networks with two separate encoder branches and one decoder stream. The feature abstract representation of deep networks is performed by down-sampling operations in encoder branches leading to some resolution loss in data. Therefore, in the decoder branch, the occurred resolution loss must be compensated. In the modality fusion process, a weighted fusion of “clean” information paths of each resolution level of the two encoders is utilized via the skip connection by the aid of the identity mapping function. The extensive experimental results on the three main challenging datasets of NYU-V2, SUN RGB-D, and Stanford 2D-3D-S show that the proposed network obtains the state-of-the-art results. © 2020, Springer Nature Switzerland AG
- Keywords:
- Deep learning ; RGB-D images ; Semantic segmentation ; Classification (of information) ; Decoding ; Deep neural networks ; Image segmentation ; Neural networks ; Semantic Web ; Semantics ; Signal encoding ; Three dimensional computer graphics ; Abstract representation ; Convolutional neural network ; Feature extractor ; Robotic applications ; Scene understanding ; Training procedures ; Computer vision
- Source: Computer Vision Conference, CVC 2019, 25 April 2019 through 26 April 2019 ; Volume 944 , 2020 , Pages 544-557
- URL: https://link.springer.com/chapter/10.1007/978-3-030-17798-0_44