A human pose prediction model based on temporal convolution. Performs inference in an autoregressive way over time intervals of up to 2 seconds.
![]() Prediction on Human3.6m dataset |
![]() Prediction on our dataset |
In this work we address the problem of 3D human pose forecasting. Given a pose representation, our model, Convolutional Mixer, first applies the convolution in temporal dimension, learning the dependency between the target joint position at previous and future time frames. Then, it performs convolution in pose dimension to assess the relation between adjacent joints. We perform experiments on Human3.6m dataset and evaluate the importance of each parameter of our model. We also evaluate it on the custom dataset recorded in the AIS lab. Finally, we extend it to perform predictions in an autoregressive fashion, which allows us to perform inference over long time intervals. Our results show that the model performs well on various motion sequences, and generalizes to novel datasets and long predictions.
![]() directions |
![]() discussion |
![]() smoking |
![]() waiting |
![]() walking |
![]() walkingtogether |
![]() directions |
![]() discussion |
![]() smoking |
![]() waiting |
![]() walking |
![]() walkingtogether |
![]() singlePerson_000 |
![]() singlePerson_001 |
![]() 2persons_001 |
![]() 2persons_002 |
![]() singlePerson_000 |
![]() singlePerson_001 |
![]() 2persons_001 |
![]() 2persons_002 |
![]() singlePerson_000 |
![]() singlePerson_001 |
![]() 2persons_001 |
![]() 2persons_002 |