Forecasting Actions and Characteristic 3D Poses

1Technical University of Munich 2Google


We propose to model longer-term future human behavior by jointly predicting action labels and 3D characteristic poses (3D poses representative of the associated actions). While previous work has considered action and 3D pose forecasting separately, we observe that the nature of the two tasks is coupled, and thus we predict them together. Starting from an input 2D video observation, we jointly predict a future sequence of actions along with 3D poses characterizing these actions.

Since coupled action labels and 3D pose annotations are difficult and expensive to acquire for videos of complex action sequences, we train our approach with action labels and 2D pose supervision from two existing action video datasets, in tandem with an adversarial loss that encourages likely 3D predicted poses.

Our experiments demonstrate the complementary nature of joint action and characteristic 3D pose prediction: our joint approach outperforms each task treated individually, enables robust longer-term sequence prediction, and outperforms alternative approaches to forecast actions and characteristic 3D poses.


Teaser. We propose to model long-term future human behavior by jointly predicting a sequence of future action labels and their realization as the 3D poses that characterize these actions (characteristic 3D poses). From an input RGB sequence and corresponding actions, we detect 2D poses that are lifted into future 3D pose predictions in forecasted future behavior.


You can download a high-quality version of this video here.




If you find this work useful for your research, please consider citing:

    title={Forecasting Actions and Characteristic 3D Poses},
    author={Diller, Christian and Funkhouser, Thomas and Dai, Angela},
    journal={arXiv preprint arXiv:2211.14309},