UprightRL: Upright Orientation Estimation of 3D Shapes via Reinforcement Learning

Computer Graphics Forum (Proceedings of PG 2021)

Luanmin Chen1    Juzhan Xu1    Chuan Wang1    Haibin Huang2    Hui Huang1    Ruizhen Hu1

1Shenzhen University    2Kuaishou Technology

Fig. 1. Grasp and place the object to its upright orientation guided by our UprightRL model.


In this paper, we study the problem of 3D shape upright orientation estimation from the perspective of reinforcement learning, i.e. we teach a machine (agent) to orientate 3D shapes step by step to upright given its current observation. Unlike previous methods, we take this problem as a sequential decision-making process instead of a strong supervised learning problem. To achieve this, we propose UprightRL, a deep network architecture designed for upright orientation estimation. UprightRL mainly consists of two submodules: an Actor module and a Critic module which can be learned with a reinforcement learning manner. Specifically, the Actor module selects an action from the action space to perform a point cloud transformation and obtain the new point cloud for the next environment state, while the Critic module evaluates the strategy and guides the Actor to choose the next stage action. Moreover, we design a reward function that encourages the agent to select action which is conducive to orient model towards upright orientation with a positive reward and negative otherwise. We conducted extensive experiments to demonstrate the effectiveness of the proposed model, and experimental results show that our network outperforms the stateof-the-art. We also apply our method to the robot grasping-and-placing experiment, to reveal the practicability of our method.

Fig. 2. The network architecture of UprightRL. Given a 3D shape point cloud at time step t, the PointNet module extracts its feature and then passes it to the Actor-Critic module, so as to obtain two outputs. Specifically, the Actor branch samples a single action from an action space at each time step to transform the point cloud, and the Critic branch is responsible for estimating the reward expectation at the current state. The reward is set to a positive value if the L2 difference between the shape’s orientation vector and its ground truth is reduced compared with that in the last time step, and to a negative value otherwise. The agent is composed of three fully connected (FC) layers and one LSTM layer. The agent extracts features and feeds to the FC layers of two subsequent branches, where an actor branch produces 5-dimensional vectors as actions and a critic branch outputs scalar value as state value.

Fig. 7. Some representative results obtained by our method on input point cloud with different completeness. The top two rows show the input and corresponding results on the complete point cloud dataset, the middle two row are those for partial point cloud dataset, and the last two rows are those for single scan dataset.

Fig. 8. Upright orientation guided grasp selection and placement pose determination method. For the given input scan represented by partial point cloud, we apply our UprightRL model to estimate the upright orientation and the method of [MEF19] to predict a set of candidate grasping poses. Then we filter out unreasonable grasping poses by comparing their projections and the object’s projection along the upright direction, and further select the optimal grasping pose based on the grasping angle relative to the upright direction. The final placement pose is determined by rotating the object so that it’s upright direction is along z-axis.

Fig. 9. Example results of the robot grasping-and-placing experiment. Each row shows the sequence from grasping to placing an object.

Data & Code

Note that the DATA and CODE are free for Research and Education Use ONLY. 

Please cite our paper (add the bibtex below) if you use any part of our ALGORITHM, CODE, DATA or RESULTS in any publication.



We thank the anonymous reviewers for their valuable comments. This work was supported by NSFC (61872250), GD Natural Science Foundation (2021B1515020085), GD Talent Program (2019JC05X328), DEGP Key Project (2018KZDXM058), Shenzhen Science and Technology Program (RCJC20200714114435012), National Engineering Laboratory for Big Data System Computing Technology, and Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ).




title = {Learning Elastic Constitutive Material and Damping Models},

author = {Luanmin Chen and Juzhan Xu and Chuan Wang and Haibin Huang and Hui Huang and Ruizhen Hu},

journal = {Computer Graphics Forum (Proceedings of Pacific Graphics)},

volume = {40},

number = {7},

year = {2021},


Downloads (faster for people in China)

Downloads (faster for people in other places)