Figure 1. An Overview of HCMoCo. a) We present HCMoCo, a versatile multi-modal pre-training framework that takes multi-modal observations of human body as input for human-centric perception. The pre-train models can be transferred to various human-centric downstream tasks with different modalities. b) Our HCMoCo shows superior performance on all four downstream tasks, especially for data-efficient settings (10% DensePose, 20% RGB/depth human parsing, 0.5/0.1% 3D pose estimation). 'IN' stands for ImageNet.
(a) General Paradigm of HCMoCo
(b) Versatility of HCMoCo
(c) NTURGBD-Parsing-4K Dataset
@article{hong2022hcmoco, title={Versatile Multi-Modal Pre-Training for Human-Centric Perception}, author={Hong, Fangzhou and Pan, Liang and Cai, Zhongang and Liu, Ziwei}, journal={arXiv preprint arXiv:2203.13815}, year={2022} }