扫描此二维码分享

RGBD Scene Labeling

发布人：吕梅发布日期：2019-06-29

Paper

Zhen Li, Yukang Gan, Xiaodan Liang, Yizhou Yu, Hui Cheng, Liang Lin*, “LSTM-CF: Unifying Context Modeling and Fusion with LSTMs for RGB-D Scene Labeling”, ECCV, 2016. PDF Code

Abstract

Semantic labeling of RGB-D scenes is crucial to many intelligent applications includeing perceptual robotics. It generates pixelwise and ne-grained label maps from simultaneously sensed photometric (RGB) and depth channels. This paper addresses this problem by i) developing a novel Long Short-Term Memorized Context Fusion (LSTM-CF) Model that captures and fuses contextual information from multiple channels of photometric and depth data, and ii) incorporating this model into deep convolutional neural networks (CNNs) for end-to-end training. Specically, contexts in photometric and depth channels are, respectively, captured by stacking several convolutional layers and a long short-term memory layer; the memory layer encodes both short-range and long-range spatial dependencies in an image along the vertical direction. Another long short-term memorized fusion layer is set up to integrate the contexts along the vertical direction from dierent channels, and perform bi-directional propagation of the fused vertical contexts along the horizontal direction to obtain true 2D global contexts. At last, the fused contextual representation is concatenated with the convolutional features extracted from the photometric channels in order to improve the accuracy of ne-scale semantic labeling. Our proposed model has set a new state of the art, i.e., 48:1% and 49:4% average class accuracy over 37 categories (2:2% and 5:4% improvement) on the large-scale SUNRGBD dataset and the NYUDv2 dataset, respectively.

38_1

Experiments

38_2

Table. 1: Comparison of scene labeling results on SUNRGBD using class-wise and average Jaccard Index

38_3

Table. 2: Comparison of scene labeling on NYUDv2.

38_4

Fig. 1: Examples of semantic labeling results on the SUNRGBD dataset. The top row shows the input RGB images, the bottom row shows scene labeling results of our proposed model and the middle row shows the ground truth.

38_5

Fig. 2: Visual comparison of scene labeling results on the NYUDv2 dataset. The first and second rows show the input RGB images and their corresponding groundtruth labeling. The third row shows the results from [1] and the last row shows the results from our model.

References

Gupta, S., Arbelaez, P., Girshick, R., Malik, J.: Indoor scene understanding with rgb-d images: Bottom-up segmentation, object detection and semantic segmentation. International Journal of Computer Vision 112(2) (2015) 133-149.

RGBD Scene Labeling

Paper

Abstract

Experiments

References

实验室地址