A Sampling Approach to Generating Closely Interacting 3D Pose-pairs from 2D Annotations

IEEE Transactions on Visualization and Computer Graphics

Kangxue Yin1        Hui Huang2,*        Edmond S. L. Ho3       Hao Wang2        Taku Komura4        Daniel Cohen-Or2,5        Hao Zhang1

1Simon Fraser University            2Shenzhen University          3Northumbria University         4Edinburgh University       5Tel Aviv University


We introduce a data-driven method to generate a large number of plausible, closely interacting 3D human pose-pairs, for a given motion category, e.g., wrestling or salsa dance. With much difficulty in acquiring close interactions using 3D sensors, our approach utilizes abundant existing video data which cover many human activities. Instead of treating the data generation problem as one of reconstruction, either through 3D acquisition or direct 2D-to-3D data lifting from video annotations, we present a solution based on Markov Chain Monte Carlo (MCMC) sampling. Given a motion category and a set of video frames depicting the motion with the 2D pose-pair in each frame annotated, we start the sampling with one or few seed 3D pose-pairs which are manually created based on the target motion category. The initial set is then augmented by MCMC sampling around the seeds, via the Metropolis-Hastings algorithm and guided by a probability density function (PDF) that is defined by two terms to bias the sampling towards 3D pose-pairs that are physically valid and plausible for the motion category. With a focus on efficient sampling over the space of close interactions, rather than pose spaces, we develop a novel representation called interaction coordinates (IC) to encode both poses and their interactions in an integrated manner. Plausibility of a 3D pose-pair is then defined based on the IC and with respect to the annotated 2D pose-pairs from video. We show that our sampling-based approach is able to efficiently synthesize a large volume of plausible, closely interacting 3D pose-pairs which provide a good coverage of the input 2D pose-pairs.


Figure 1. Closely interacting 3D wrestling poses automatically generated by MCMC sampling from a single seed pose (center).

Figure 2. Direct 2D-to-3D lifting (second and third rows) vs. our approach (bottom) for the wrestling motion. Top row shows the annotated posepairs in video, as input to direct 2D-to-3D lifting in second row [10] and third row [11]. Results in the bottom row are retrieved samples generated by our method, whose projections are close to the respective 2D posepairs in the top row. Quality and plausibility of the close interactions can be contrasted by focusing on the circled regions.

Figure 3. Overview of our 3D pose-pair generation via lifting-by-sampling.

Figure 6. Ablation study of different terms in our probability density function that controls the MCMC sampling for Judo interactions. The 3D pose-pairs shown on the right, for each version of the PDF, represent the 200-th, 400-th, 600-th, and 800-th samples, respectively. To visually validate the results in (c), we also show examples of 2D annotations which contributed to the likelihood term in the bottom row (d).

Figure 7. With an unbiased proposal function, the sampling algorithm tends to produce more diverse but less plausible pose-pairs. The first row shows the 200-th, 400-th, 600-th, and 800-th samples generated from the same seed shown in Figure 6(a). The second row shows examples of 2D annotations that contribute to the likelihood term.


Figure 10. Comparisons to enhanced lifting (b) and a naive sampling (c) around the 3D pose-pairs generated by enhanced lifting. For a clear assessment of the comparisons, we also show 2D annotations (a) and our results (d) that are the same as shown in Figure 2.

Figure 11. From a single seed, our sampling schema produces diverse pose-pairs of wrestling. Three closest 2D annotations are provided to the right side of each sampled 3D pose-pair to demonstrate its validity.

The authors would like to thank the anonymous reviewers for their valuable comments and suggestions. This work was supported in part by NSFC (61522213, 61761146002, 6171101466), 973 Program (2015CB352501), Guangdong Science and Technology Program (2015A030312015), Shenzhen Innovation Program (KQJSCX20170727101233642, JCYJ20151015151249564), ISFNSFC Joint Research Program (2472/17) and NSERC (611370).



title = {A Sampling Approach to Generating Closely Interacting 3D Pose-pairs from 2D Annotations},
author = {Kangxue Yin and Hui Huang and Hao Wang and Taku Komura and Daniel Cohen-Or and Hao Zhang},
journal = {IEEE Transactions on Visualization and Computer Graphics},
volume = {},
number = {},
pages = {},  
year = {2018},