Sim2Real Learning of Obstacle Avoidance for Robotic Manipulators in Uncertain Environments

IEEE Robotics and Automation Letters 2022

Tan Zhang1    Kefang Zhang2    Jiatao Lin2    Wing-Yue Geoffrey Louie3    Hui Huang2*

1Shenzhen Technology University    2Shenzhen University    3Oakland University

Fig. 1. An overview of our vision-based manipulator obstacle avoidance system.


Obstacle avoidance for robotic manipulators can be challenging when they operate in unstructured environments. This problem is probed with the sim-to-real (sim2real) deep reinforcement learning, such that a moving policy of the robotic arm is learnt in a simulator and then adapted to the real world. However, the problem of sim2real adaptation is notoriously difficult. To this end, this work proposes (1) a unified representation of obstacles and targets to capture the underlying dynamics of the environment while allowing generalization to unseen goals and (2) a flexible end-to-end model combining the unified representation with the deep reinforcement learning control module that can be trained by interacting with the environment. Such a representation is agnostic to the shape and appearance of the underlying objects, which simplifies and unifies the scene representation in both simulated and real worlds. We implement this idea with a vision-based actor-critic framework by devising a bounding box predictor module. The predictor estimates the 3D bounding boxes of obstacles and targets from the RGB-D input. The features extracted by the predictor are fed into the policy network, and all the modules are jointly trained. This makes the policy learn object-aware scene representation, which leads to a data-efficient learning of the obstacle avoidance policy. Our experiments in simulated environment and the real-world show that the end-to-end model of the unified representation achieves better sim2real adaption and scene generalization than state-of-the-art techniques.

Fig. 2. The Vision Module with a vision processing method. Fully connected layers are optional, trainable layers, which process the output of preceding blocks; the module receives a vector Xvision and outputs a vector Yvision.

Fig. 3. Network architecture.

Fig. 4. Experimental setup in a simulated environment: the three middle images from top to bottom denote the scene from the perspective of our Fetch robot, the corresponding depth image, and the semantically segmented image, respectively; the right image represents the 3D bounding boxes of the observed objects.

Fig. 5. Some training scenes. (a-d) are for pre-training and (e-h) are for training in Gazebo. In the pre-training scenes, the green ball is set as the target and the other boxes are obstacles. In the training scenes, the vase is set as the target and the other objects are obstacles.

Fig. 8. Experiment Setup. (a-b) The real experimental setup (a) and the real time reconstruction and recognition (b) of the scene using a depth sensor. (c-j) The Fetch robot successfully reached the purple metal box as a target while the whole scene was changing.

Fig. 9. Physical experiment with direct training without detection of 3D bounding boxes (SAC-NUR).


This paper was supported in parts by NSFC (U2001206), Guangdong Talent Program (2019JC05X328), Guangdong Science and Technology Program (2020A0505100064), DEGP Key Project (2018KZDXM058), and Shenzhen Science and Technology Program (RCJC20200714114435012, JCYJ20210324120213036).



title={Sim2Real Learning of Obstacle Avoidance for Robotic Manipulators in Uncertain Environments},

author={Tan Zhang and Kefang Zhang and Jiatao Lin and Wing-Yue Geoffrey Louie and Hui Huang},

journal={IEEE Robotics and Automation Letters},






Downloads (faster for people in China)

Downloads (faster for people in other places)