We present a learning-based method for generating animated 3D pose sequences depicting multiple sequential or superimposed actions provided in long, compositional sentences. We propose a hierarchical two-stream sequential model to explore a finer joint-level mapping between natural language sentences and the corresponding 3D pose sequences of the motions. We learn two manifold representations of the motion — one each for the upper body and the lower body movements. We evaluate our proposed model on the publicly available KIT Motion-Language Dataset containing 3D pose data with human-annotated sentences. Experimental results show that our model advances the state-of-the-art on text-based motion synthesis in objective evaluations by a margin of 50%.
@inproceedings{pub11755,
author = {
Ghosh, Anindita
and
Cheema, Noshaba
and
Oguz, Cennet
and
Theobalt, Christian
and
Slusallek, Philipp
},
title = {Text-Based Motion Synthesis with a Hierarchical Two-Stream RNN},
booktitle = {SIGGRAPH '21 Posters. ACM Siggraph (Siggraph-21), August 9-13, Virtual},
year = {2021},
publisher = {ACM}
}
Deutsches Forschungszentrum für Künstliche Intelligenz German Research Center for Artificial Intelligence