VGF-Net: Visual-Geometric Fusion Learning for Simultaneous Drone Navigation and Height Mapping

Graphical Models 2021

Yilin Liu    Ke Xie    Hui Huang*

Shenzhen University


The drone navigation requires the comprehensive understanding of both visual and geometric information in the 3D world. In this paper, we present a Visual- Geometric Fusion Network (VGF-Net), a deep network for the fusion analysis of visual/geometric data and the construction of 2.5D height maps for simultaneous drone navigation in novel environments. Given an initial rough height map and a sequence of RGB images, our VGF-Net extracts the visual information of the scene, along with a sparse set of 3D keypoints that capture the geometric relationship between objects in the scene. Driven by the data, VGF-Net adaptively fuses visual and geometric information, forming a unified Visual- Geometric Representation. This representation is fed to a new Directional Attention Model (DAM), which helps enhance the visual-geometric object relationship and propagates the informative data to dynamically refine the height map and the corresponding keypoints. An entire end-to-end information fusion and mapping system is formed, demonstrating remarkable robustness and high accuracy on the autonomous drone navigation across complex indoor and large-scale outdoor scenes.

Figure 2: Overview of VGF-Net. At the tth moment, the network uses convolutional layers to learn visual and geometric representations from the RGB image It and 2.5D height map Mt (produced at the (t -1)th  moment).  The representations are combined to compute the residual update map Rc, which is added to the 2.5D height map to form a renewed height  map Mc. Based on the new height map and the 3D keypoints {pt,1, ..., pt,N} (produced by SLAM), we construct the VG representation for each keypoint (yellow dot), which is used by DAM to select useful information to refine object boundaries and 3D keypoints at the next moment. Note that the refined height map Mrt+1 is used for path planning, which is omitted fora simple illustration.

Figure 3: Illustration of fusing visual and geometric information for updating the 2.5D height map. (a) We construct the VG representation for each 3D keypoint (yellow dot) projected to the 2.5D height map. The information of VG representation is propagated to surrounding object boundaries, along different directions (indicated by different colors). The distance between the keypoint and object boundary (black arrow) determines the weight for adjusting the information propagation. The dash arrow means that there is no object along the corresponding direction. (b) Given the existing object boundary, we use DAM to select the most relevant keypoint along each direction. We use the selected keypoints to provide fused visual and geometric information, which is used for refining object boundary.
Figure 5: Illustration of disturbance manipulations. Actually, these manipulations can be combined to yield the disturbance results (e.g., translation and dilation). The bottom row of this figure shows the difference between height maps before/after disturbance. The residual map is learned by our VGF-Net, for recovering the disturbed height map to the undisturbed counterpart.

Figure 8: The successful navigation trajectories produced by VGF-Net in a complicate indoor testing scene from the S3DIS dataset [1].


Note that the DATA and CODE are free for Research and Education Use ONLY. 

Please cite our paper (add the bibtex below) if you use any part of our ALGORITHM, CODE, DATA or RESULTS in any publication.



We would like to thank the anonymous reviewers for their constructive comments. This work was supported in parts by NSFC Key Project (U2001206), Guangdong Outstanding Talent Program (2019JC05X328), Guangdong Science and Technology Program (2020A0505100064, 2018A030310441, 2015A030312015), DEGP Key Project (2018KZDXM058), Shenzhen Science and Technology Program (RCJC20200714114435012), and Guangdong Laboratory of Artificial Intelligence and Digital Economy (Shenzhen University).



title={VGF-Net: Visual-Geometric Fusion Learning for Simultaneous Drone Navigation and Height Mapping},

author={Yilin Liu and Ke Xie and Hui Huang},

journal={Graphical Models},





Downloads(faster for people in China)

Downloads(faster for people in other places)