Data-Driven Contextual Modeling for 3D Scene Understanding

Computers and Graphics

Yifei Shi1,        Pinxin Long3,        Kai Xu1,3        Hui Huang2,3        Yueshan Xiong1

1HPCL, National University of Defense Technology     2Shenzhen University     3Shenzhen VisuCA Key Lab / SIAT

Figure 1: Scene understanding by our method. Top: the input point cloud of a table-top scene. Bottom: the labeling result (legends show semantic labels in color).

The recent development of fast depth map fusion technique enables the realtime, detailed scene reconstruction using commodity depth camera, making the indoor scene understanding more possible than ever. To address the specific challenges in object analysis at subscene level, this work proposes a data-driven approach to modeling contextual information covering both intra-object part relations and inter-object object layouts. Our method combines the detection of individual objects and object groups within the same framework, enabling contextual analysis without knowing the objects in the scene a priori. The key idea is that while contextual information could benefit the detection of either individual objects or object groups, both can contribute to object extraction when objects are unknown.
Our method starts with a robust segmentation and partitions a subscene into segments, each of which represents either an independent object or a part of some object. A set of classifiers are trained for both individual objects and object groups, using a database of 3D scene models. We employ the multiple kernel learning (MKL) to learn per-category optimized classifiers for objects and object groups. Finally, we perform a graph matching to extract objects using the classifiers, thus grouping the segments into either an object or an object group. The output is an object-level labeled segmentation of the input subscene. Experiments demonstrate that the unified contextual analysis framework achieves robust object detection and recognition over cluttered subscenes.


Figure 2: An overview of our algorithm. We first over-segment the scene and extract the supporting plane on the patch graph, then segment the scene into segments and represent the whole scene using a segment graph (a). To obtain the contextual information, we train a set of classifiers for both single objects and object groups using multiple kernel learning (b). The classifiers are used to group the segments into objects or object groups (c).


Figure 3: The segmentation results our algorithm over the scenes from the NYU-Depth V2 dataset. Our method can segment most objects correctly in the highly cluttered scenes.

We thank all the reviewers for their comments and feed back. We would also like to acknowledge our research  grants: NSFC (61572507, 61202333, 61379103),973 Program (2014CB360503), Guangdong Science and Technology Program (2015A030312015,2014B050502009, 2014TX01X033), Shenzhen VisuCA Key Lab (CXB201104220029A).


    title = {Data-driven contextual modeling for 3D scene understanding},
    author = {Yifei Shi, Pinxin Long, Kai Xu, Hui Huang, Yueshan Xiong},
    journal = {Computers and Graphics},
    volume = {55},
    issue = {C},
    pages = {55-67},
    year = {2016}

Copyright © 2016-2018 Visual Computing Research Center