Object Properties Inferring from and Transfer for Human Interaction Motions

Computational Visual Media (Proceedings of CVM 2021)

Qian Zheng1    Weikai Wu1    Hanting Pan1    Niloy Mitra2    Daniel Cohen-Or1,3    Hui Huang1*

1Shenzhen University    2University College London    3Tel Aviv University 

Fig. 1. The actor is lifting a box from the table. Can the skeletal motion tell whether the box being lifted is light or heavy?


Humans regularly interact with their surrounding objects. Such interactions often result in strongly correlated motion between humans and the interacting objects. We thus ask:“Is it possible to infer object properties from skeletal motion alone, even without seeing the interacting object itself?” In this paper, we present a fine-grained action recognition method that learns to infer such latent object properties from human interaction motion alone. This inference allows us to disentangle the motion from the object property and transfer object properties to a given motion. We collected a large number of videos and 3D skeletal motions of the performing actors using an inertial motion capture device. We analyze similar actions and learn subtle differences among them to reveal latent properties of the interacting objects. In particular, we learn to identify the interacting object, by estimating its weight, or its fragility or delicacy. Our results clearly demonstrate that the interaction motions and interacting objects are highly correlated and indeed relative object latent properties can be inferred from the 3D skeleton sequences alone, leading to new synthesis possibilities for human interaction motions.

Fig. 2. For each sample, we capture a 3D skeleton sequence by an inertial motion tracking suit, an ego-centric video by a head-mounted camera, two other videos by two cameras placed outside, and the object’s geometry along with its properties.

Fig. 3. Eight interaction motions represented in our dataset, which comprises of 4k+ interaction captures across 100 different participants.

Fig. 8. Estimating the joint-level importance of a fishing motion for inferring the object property. Note that here the color of magenta to cyan indicates the importance from high to low.

Fig. 9. We show some 2D skeletons extracted from our recorded video at the top, where missing parts are highlighted with red boxes. In comparison, 3D IMU skeletons captured at the corresponding frames are shown underneath, which are clean and complete.

Fig. 12. Given a motion sequence of an unseen subject walking on the wide path (in green), we can generate a new sequence that looks like the subject was walking on a narrow path (in blue).

Fig. 13. Given the motion sequence shown in Figure 1, we can generate a new sequence that looks like the subject was lifting a heavy box, but it was too heavy to be lifted. The generated motion is similar to the ground truth as shown with a sequence of RGB images at the bottom.

Data & Code

Note that the DATA and CODE are free for Research and Education Use ONLY. 

Please cite our paper (add the bibtex below) if you use any part of our ALGORITHM, CODE, DATA or RESULTS in any publication.

Link: https://vcc.tech/research/2020/IMDataset


We sincerely thank the reviewers for their valuable comments. This work was supported in parts by Shenzhen Innovation Program (JCYJ20180305125709986), NSFC (61861130365, 61761146002), GD Science and Technology Program (2020A0505100064, 2015A030312015), and DEGP Key Project (2018KZDXM058).


    title={Object Properties Inferring from and Transfer for Human Interaction Motions},
    author={Qian Zheng and Weikai Wu and Hanting Pan and Niloy Mitra and Daniel Cohen-Or and Hui Huang},
    journal = {Computational Visual Media (Proceedings of CVM 2021)},

Downloads (faster for people in China)

Downloads (faster for people in other places)