Capturing Closely Interacted Two-Person Motions with Reaction Priors

CVPR 2024

Qi Fang1 Yinghui Fan1 Yanjun Li1 Junting Dong2 Dingwei Wu1, Weidong Zhang1, Kang Chen1
1NetEase Games AI Lab,  2Shanghai AI Lab 

Abstract

In this paper, we focus on capturing closely interacted two-person motions from monocular videos, an important yet understudied topic. Unlike less-interacted motions, closely interacted motions contain frequently occurring inter-human occlusions, which pose significant challenges to existing capturing algorithms. To address this problem, our key observation is that close physical interactions between two subjects typically happen under very specific situations (e.g., handshake, hug, etc.), and such situational contexts contain strong prior semantics to help infer the poses of occluded joints. In this spirit, we introduce reaction priors, which are invertible neural networks that bi-directionally model the pose probability distributions of one person given the pose of the other. The learned reaction priors are then incorporated into a query-based pose estimator, which is a decoder-only Transformer with self-attentions on both intra-joint and inter-joint relationships. We demonstrate that our design achieves considerably higher performance than previous methods on multiple benchmarks. What's more, as existing datasets lack sufficient cases of close human-human interactions, we also build a new dataset called Dual-Human to better evaluate different methods. Dual-Human contains around 2k sequences of closely interacted two-person motions, each with synthetic multi-view renderings, contact annotations, and text descriptions. We believe that this new public dataset can significantly promote further research in this area.

Video

BibTeX

@inproceedings{fang2024dualhuman,
  title     = {Capturing Closely Interacted Two-Person Motions with Reaction Priors},
  author    = {Fang, Qi and Fan, Yinghui and Li, Yanjun and Dong, Junting and Wu, Dingwei and Zhang, Weidong and Chen, Kang},
  booktitle = {CVPR},
  year      = {2024},
}