Figure 1. An Overview of HCMoCo. a) We present HCMoCo, a versatile multi-modal pre-training framework that takes multi-modal observations of human body as input for human-centric perception. The pre-train models can be transferred to various human-centric downstream tasks with different modalities. b) Our HCMoCo shows superior performance on all four downstream tasks, especially for data-efficient settings (10% DensePose, 20% RGB/depth human parsing, 0.5/0.1% 3D pose estimation). 'IN' stands for ImageNet.
(a) General Paradigm of HCMoCo
Figure 2. Illustration of the general paradigm of HCMoCo.
(b) Versatility of HCMoCo
Figure 3. Pipelines of Two Applications of HCMoCo.
(c) NTURGBD-Parsing-4K Dataset
Figure 4. Illustration of the RGB-D human parsing dataset NTURGBD-Parsing-4K.
@article{hong2022hcmoco,
title={Versatile Multi-Modal Pre-Training for Human-Centric Perception},
author={Hong, Fangzhou and Pan, Liang and Cai, Zhongang and Liu, Ziwei},
journal={arXiv preprint arXiv:2203.13815},
year={2022}
}