avatar

Fangzhou Hong 洪方舟

Ph.D. Student in MMLab@NTU

About Me

Fangzhou Hong is currently a final-year Ph.D. student (2021-) at College of Computing and Data Science in Nanyang Technological University, with MMLab@NTU, supervised by Prof. Ziwei Liu. Previously, he received B.Eng. degree in Software Engineering from Tsinghua University (2016-2020). He was fortunate to have an internship with Meta Reality Labs Research in 2023. His research interests lie in 3D computer vision and its intersection with computer graphics.

News
[2024-07]

Four papers accepted to ECCV 2024.

[2024-06]
[2024-02]

Two papers accepted to CVPR 2024.

[2024-01]

One paper accepted to ICLR 2024 (DiffTF).

[2023-12]

Two papers accepted to TPAMI (4D-DS-Net and MotionDiffuse).

[2023-09]

Two papers accecpted to NeurIPS 2023 (one spotlight, one poster).

[2023-08]

We are hosting OmniObject3D challenge.

[2023-07]

Three papers accepted to ICCV 2023.

[2023-05]

I am recognized as CVPR 2023 Outstanding Reviewer.

[2023-01]

One paper (EVA3D) accepted to ICLR 2023 as Spotlight.

[2022-07]

One paper (HuMMan) accepted to ECCV 2022 for Oral presentation.

[2022-05]

One paper (AvatarCLIP) accepted to SIGGRAPH 2022 (journal track).

[2022-03]

One paper (HCMoCo) accepted to CVPR 2022 for Oral presentation.

[2021-09]

One paper (Garment4D) accepted to NeurIPS 2021.

[2021-09]

I am awarded Google PhD Fellowship 2021 (Machine Perception).

[2021-07]

One paper (extended Cylinder3D) accepted by TPAMI.

[2021-03]

Two papers (DS-Net and Cylinder3D) accepted to CVPR 2021.

[2021-01]

Start my journey in MMLab@NTU!

Publications
egolm.png

EgoLM: Multi-Modal Language Model of Egocentric Motions

arXiv Preprint, 2024

EgoLM is a language model-based framework that tracks and understands egocentric motions from multi-modal inputs, i.e., egocentric videos and sparse motion sensors.

3dtopia_render.jpg

3DTopia: Large Text-to-3D Generation Model with Hybrid Diffusion Priors

arXiv Preprint, 2024

Text-to-3D Generation within 5 Minutes! A two-stage design, utilizing both 3D difffusion prior and 2D priors.

4D-DS-Net.png

Unified 3D and 4D Panoptic Segmentation via Dynamic Shifting Networks

Fangzhou Hong, Lingdong Kong, Hui Zhou, Xinge Zhu, Hongsheng Li, Ziwei Liu

Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023

Extension of the CVPR21 Version; Extend DS-Net to 4D panoptic LiDAR segmentation by the temporally unified instance clustering on aligned LiDAR frames.

SHERF_crop.png

SHERF: Generalizable Human NeRF from a Single Image

Shoukang Hu*, Fangzhou Hong*, Liang Pan, Haiyi Mei, Lei Yang, Ziwei Liu

International Conference on Computer Vision (ICCV), 2023

Reconstruct human NeRF from a single image in one forward pass!

EVA3D.gif

EVA3D: Compositional 3D Human Generation from 2D Image Collections

International Conference on Learning Representations (ICLR), 2023 (Spotlight)

EVA3D is a high-quality unconditional 3D human generative model that only requires 2D image collections for training.

avatarclip.png

AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars

Fangzhou Hong*, Mingyuan Zhang*, Liang Pan, Zhongang Cai, Lei Yang, Ziwei Liu

ACM Transactions on Graphics (SIGGRAPH), 2022

AvatarCLIP empowers layman users to customize a 3D avatar with the desired shape and texture, and drive the avatar with the described motions using solely natural languages.

hcmoco.png

Versatile Multi-Modal Pre-Training for Human-Centric Perception

Fangzhou Hong, Liang Pan, Zhongang Cai, Ziwei Liu

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022 (Oral)

The first to leverage the multi-modal nature of human data (e.g. RGB, depth, 2D key-points) for effective human-centric representation learning.

garment4d.png

Garment4D: Garment Reconstruction from Point Cloud Sequences

Fangzhou Hong, Liang Pan, Zhongang Cai, Ziwei Liu

35th Conference on Neural Information Processing Systems (NeurIPS), 2021

The first attempt at separable and interpretable garment reconstruction from point cloud sequences, especially challenging loose garments.

dsnet.png

LiDAR-based Panoptic Segmentation via Dynamic Shifting Network

Fangzhou Hong, Hui Zhou, Xinge Zhu, Hongsheng Li, Ziwei Liu

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021

Rank 1st in the public leaderboard of SemanticKITTI panoptic segmentation (2020-11-16); A learnable clustering module is designed to adapt kernel functions to complex point distributions.

nymeria.gif

Nymeria: A Massive Collection of Multimodal Egocentric Daily Motion in the Wild

Lingni Ma, Yuting Ye, Fangzhou Hong, Vladimir Guzov, Yifeng Jiang, Rowan Postyeni, Luis Pesqueira, Alexander Gamino, Vijay Baiyya, Hyo Jin Kim, Kevin Bailey, David S. Fosas, C. Karen Liu, Ziwei Liu, Jakob Engel, Renzo De Nardi, Richard Newcombe

European Conference on Computer Vision (ECCV), 2024

A large-scale, diverse, richly annotated human motion dataset collected in the wild with multi-modal egocentric devices.

LN3Diff.gif

LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation

European Conference on Computer Vision (ECCV), 2024

LN3Diff creates high-quality 3D object mesh from text within 8 V100-SECONDS.

structldm2.jpg

StructLDM: Structured Latent Diffusion for 3D Human Generation

Tao Hu, Fangzhou Hong, Ziwei Liu

European Conference on Computer Vision (ECCV), 2024

StructLDM is a diffusion-based unconditional 3D human generative model learned from 2D images.

LLM.png

Large Motion Model for Unified Multi-Modal Motion Generation

European Conference on Computer Vision (ECCV), 2024

Large Motion Model (LMM) is a motion-centric, multi-modal framework that unifies mainstream motion generation tasks into a generalist model.

surmo.gif

SurMo: Surface-based 4D Motion Modeling for Dynamic Human Rendering

Tao Hu, Fangzhou Hong, Ziwei Liu

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024

Dynamic human rendering with the joint modeling of motion dynamics and appearance.

citydreamer.gif

CityDreamer: Compositional Generative Model of Unbounded 3D Cities

Haozhe Xie, Zhaoxi Chen, Fangzhou Hong, Ziwei Liu

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024

Unbouned 3D cities generated from 2D image collections!

difftf.png

Large-Vocabulary 3D Diffusion Model with Transformer

International Conference on Learning Representations (ICLR), 2024

DiffTF achieves state-of-the-art large-vocabulary 3D object generation performance with 3D-aware transformers.

MotionDiffusion.png

MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model

Mingyuan Zhang*, Zhongang Cai*, Liang Pan, Fangzhou Hong, Xinying Guo, Lei Yang, Ziwei Liu

Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023

The first diffusion-model-based text-driven motion generation framework with probabilistic mapping, realistic synthesis and multi-level manipulation ability.

4D-DS-Net.png

Unified 3D and 4D Panoptic Segmentation via Dynamic Shifting Networks

Fangzhou Hong, Lingdong Kong, Hui Zhou, Xinge Zhu, Hongsheng Li, Ziwei Liu

Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023

Extension of the CVPR21 Version; Extend DS-Net to 4D panoptic LiDAR segmentation by the temporally unified instance clustering on aligned LiDAR frames.

primdiffusion_small.gif

PrimDiffusion: Volumetric Primitives Diffusion for 3D Human Generation

Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS), 2023

PrimDiffusion performs the diffusion and denoising process on a set of primitives which compactly represent 3D humans.

4DPSG.png

4D Panoptic Scene Graph Generation

Jingkang Yang, Jun Cen, Wenxuan Peng, Shuai Liu, Fangzhou Hong, Xiangtai Li, Kaiyang Zhou, Qifeng Chen, Ziwei Liu

Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS), 2023 (Spotlight)

To allow artificial intelligence to develop a comprehensive understanding of a 4D world, we introduce 4D Panoptic Scene Graph (PSG-4D), a new representation that bridges the raw visual data perceived in a dynamic 4D world and high-level visual understanding.

SHERF_crop.png

SHERF: Generalizable Human NeRF from a Single Image

Shoukang Hu*, Fangzhou Hong*, Liang Pan, Haiyi Mei, Lei Yang, Ziwei Liu

International Conference on Computer Vision (ICCV), 2023

Reconstruct human NeRF from a single image in one forward pass!

DeformToon3D.png

DeformToon3D: Deformable 3D Toonification from Neural Radiance Fields

Junzhe Zhang*, Yushi Lan*, Shuai Yang, Fangzhou Hong, Quan Wang, Chai Kiat Yeo, Ziwei Liu, Chen Change Loy

International Conference on Computer Vision (ICCV), 2023

We learn a style field that deforms real 3D faces to styleized 3D faces.

ReMoDiffuse.gif

ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model

International Conference on Computer Vision (ICCV), 2023

ReMoDiffuse is a diffusion-model-based motion generation framework that integrates a retrieval mechanism to refine the denoising process, which enhances the generalizability and diversity.

EVA3D.gif

EVA3D: Compositional 3D Human Generation from 2D Image Collections

International Conference on Learning Representations (ICLR), 2023 (Spotlight)

EVA3D is a high-quality unconditional 3D human generative model that only requires 2D image collections for training.

humman.png

HuMMan: Multi-Modal 4D Human Dataset for Versatile Sensing and Modeling

Zhongang Cai*, Daxuan Ren*, Ailing Zeng*, Zhengyu Lin*, Tao Yu*, Wenjia Wang*, Xiangyu Fan, Yang Gao, Yifan Yu, Liang Pan, Fangzhou Hong, Mingyuan Zhang, Chen Change Loy, Lei Yang, Ziwei Liu

European Conference on Computer Vision (ECCV), 2022 (Oral)

A large-scale multi-modal (color images, point clouds, keypoints, SMPL parameters, and textured meshes) 4D human dataset with 1000 human subjects, 400k sequences and 60M frames.

avatarclip.png

AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars

Fangzhou Hong*, Mingyuan Zhang*, Liang Pan, Zhongang Cai, Lei Yang, Ziwei Liu

ACM Transactions on Graphics (SIGGRAPH), 2022

AvatarCLIP empowers layman users to customize a 3D avatar with the desired shape and texture, and drive the avatar with the described motions using solely natural languages.

hcmoco.png

Versatile Multi-Modal Pre-Training for Human-Centric Perception

Fangzhou Hong, Liang Pan, Zhongang Cai, Ziwei Liu

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022 (Oral)

The first to leverage the multi-modal nature of human data (e.g. RGB, depth, 2D key-points) for effective human-centric representation learning.

garment4d.png

Garment4D: Garment Reconstruction from Point Cloud Sequences

Fangzhou Hong, Liang Pan, Zhongang Cai, Ziwei Liu

35th Conference on Neural Information Processing Systems (NeurIPS), 2021

The first attempt at separable and interpretable garment reconstruction from point cloud sequences, especially challenging loose garments.

dsnet.png

LiDAR-based Panoptic Segmentation via Dynamic Shifting Network

Fangzhou Hong, Hui Zhou, Xinge Zhu, Hongsheng Li, Ziwei Liu

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021

Rank 1st in the public leaderboard of SemanticKITTI panoptic segmentation (2020-11-16); A learnable clustering module is designed to adapt kernel functions to complex point distributions.

cylinder3d_extend.png

Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR-based Perception

Xinge Zhu, Hui Zhou, Tai Wang, Fangzhou Hong, Wei Li, Yuexin Ma, Hongsheng Li, Ruigang Yang, Dahua Lin

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021

Journal Extension of the CVPR21 version; Extend the cylindrical convolution to more general LiDAR-based perception tasks.

cylinder3d.png

Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation

Xinge Zhu*, Hui Zhou*, Fangzhou Hong, Yuexin Ma, Wei Li, Hongsheng Li, Dahua Lin

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021 (Oral)

Rank 1st in the public leaderboard of SemanticKITTI semantic segmentation (2020-11-16); Cylindrical 3D convolution is designed to explore the 3D geometric pattern of LiDAR point clouds.

lrcnet.png

LRC-Net: Learning Discriminative Features on Point Clouds by Encoding Local Region Contexts

Xinhai Liu, Zhizhong Han, Fangzhou Hong, Yu-Shen Liu, Matthias Zwicker

Computer Aided Geometric Design, 2020, 79: 101859. (SCI, 2017 Impact factor: 1.421, CCF B)

To learn discriminative features on point clouds by encoding the fine-grained contexts inside and among local regions simultaneously.

egolm.png

EgoLM: Multi-Modal Language Model of Egocentric Motions

arXiv Preprint, 2024

EgoLM is a language model-based framework that tracks and understands egocentric motions from multi-modal inputs, i.e., egocentric videos and sparse motion sensors.

3dtopia-xl.png

3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion

arXiv Preprint, 2024

3DTopia-XL scales high-quality 3D asset generation using Diffusion Transformer (DiT) built upon an expressive and efficient 3D representation, PrimX. The denoising process takes 5 seconds to generate a 3D PBR asset from text / image input which is ready for graphics pipeline to use.

hmd2.png

HMD2: Environment-aware Motion Generation from Single Egocentric Head-Mounted Device

arXiv Preprint, 2024

We propose HMD2, the first system for the online generation of full-body self-motion using a single head-mounted device (e.g. Project Aria Glasses) equipped with an outward-facing camera in complex and diverse environments.

gaussiancity.gif

GaussianCity: Generative Gaussian Splatting for Unbounded 3D City Generation

Haozhe Xie, Zhaoxi Chen, Fangzhou Hong, Ziwei Liu

arXiv Preprint, 2024

GaussianCity is a framework for efficient unbounded 3D city generation using 3D Gaussian Splatting.

difftf++.png

DiffTF++: 3D-aware Diffusion Transformer for Large-Vocabulary 3D Generation

arXiv Preprint, 2024

Extension of our ICLR 2024 paper DiffTF. Joint training of diffusion model and Triplane representation increases the generation quality.

fashionengine.jpg

FashionEngine: Interactive Generation and Editing of 3D Clothed Humans

Tao Hu, Fangzhou Hong, Zhaoxi Chen, Ziwei Liu

arXiv Preprint, 2024

FashionEngine is an interactive 3D human generation and editing system with multimodal control (e.g., texts, images, hand-drawing sketches).

3dtopia_render.jpg

3DTopia: Large Text-to-3D Generation Model with Hybrid Diffusion Priors

arXiv Preprint, 2024

Text-to-3D Generation within 5 Minutes! A two-stage design, utilizing both 3D difffusion prior and 2D priors.

humanliff.png

HumanLiff: Layer-wise 3D Human Generation with Diffusion Model

Shoukang Hu, Fangzhou Hong, Tao Hu, Liang Pan, Weiye Xiao, Haiyi Mei, Lei Yang, Ziwei Liu

arXiv Preprint, 2023

We generate 3D digital humans using 3D diffusion model in a controllable, layer-wise way.

pointhps.gif

PointHPS: Cascaded 3D Human Pose and Shape Estimation from Point Clouds

Zhongang Cai*, Liang Pan*, Chen Wei, Wanqi Yin, Fangzhou Hong, Mingyuan Zhang, Chen Change Loy, Lei Yang, Ziwei Liu

arXiv Preprint, 2023

SMPL reconstruction from real depth sensor, which are partial point cloud inputs.

3dtopia-xl.png

3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion

arXiv Preprint, 2024

3DTopia-XL scales high-quality 3D asset generation using Diffusion Transformer (DiT) built upon an expressive and efficient 3D representation, PrimX. The denoising process takes 5 seconds to generate a 3D PBR asset from text / image input which is ready for graphics pipeline to use.

3dtopia_render.jpg

3DTopia: Large Text-to-3D Generation Model with Hybrid Diffusion Priors

arXiv Preprint, 2024

Text-to-3D Generation within 5 Minutes! A two-stage design, utilizing both 3D difffusion prior and 2D priors.

LN3Diff.gif

LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation

European Conference on Computer Vision (ECCV), 2024

LN3Diff creates high-quality 3D object mesh from text within 8 V100-SECONDS.

difftf++.png

DiffTF++: 3D-aware Diffusion Transformer for Large-Vocabulary 3D Generation

arXiv Preprint, 2024

Extension of our ICLR 2024 paper DiffTF. Joint training of diffusion model and Triplane representation increases the generation quality.

difftf.png

Large-Vocabulary 3D Diffusion Model with Transformer

International Conference on Learning Representations (ICLR), 2024

DiffTF achieves state-of-the-art large-vocabulary 3D object generation performance with 3D-aware transformers.

fashionengine.jpg

FashionEngine: Interactive Generation and Editing of 3D Clothed Humans

Tao Hu, Fangzhou Hong, Zhaoxi Chen, Ziwei Liu

arXiv Preprint, 2024

FashionEngine is an interactive 3D human generation and editing system with multimodal control (e.g., texts, images, hand-drawing sketches).

structldm2.jpg

StructLDM: Structured Latent Diffusion for 3D Human Generation

Tao Hu, Fangzhou Hong, Ziwei Liu

European Conference on Computer Vision (ECCV), 2024

StructLDM is a diffusion-based unconditional 3D human generative model learned from 2D images.

primdiffusion_small.gif

PrimDiffusion: Volumetric Primitives Diffusion for 3D Human Generation

Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS), 2023

PrimDiffusion performs the diffusion and denoising process on a set of primitives which compactly represent 3D humans.

DeformToon3D.png

DeformToon3D: Deformable 3D Toonification from Neural Radiance Fields

Junzhe Zhang*, Yushi Lan*, Shuai Yang, Fangzhou Hong, Quan Wang, Chai Kiat Yeo, Ziwei Liu, Chen Change Loy

International Conference on Computer Vision (ICCV), 2023

We learn a style field that deforms real 3D faces to styleized 3D faces.

EVA3D.gif

EVA3D: Compositional 3D Human Generation from 2D Image Collections

International Conference on Learning Representations (ICLR), 2023 (Spotlight)

EVA3D is a high-quality unconditional 3D human generative model that only requires 2D image collections for training.

avatarclip.png

AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars

Fangzhou Hong*, Mingyuan Zhang*, Liang Pan, Zhongang Cai, Lei Yang, Ziwei Liu

ACM Transactions on Graphics (SIGGRAPH), 2022

AvatarCLIP empowers layman users to customize a 3D avatar with the desired shape and texture, and drive the avatar with the described motions using solely natural languages.

humanliff.png

HumanLiff: Layer-wise 3D Human Generation with Diffusion Model

Shoukang Hu, Fangzhou Hong, Tao Hu, Liang Pan, Weiye Xiao, Haiyi Mei, Lei Yang, Ziwei Liu

arXiv Preprint, 2023

We generate 3D digital humans using 3D diffusion model in a controllable, layer-wise way.

gaussiancity.gif

GaussianCity: Generative Gaussian Splatting for Unbounded 3D City Generation

Haozhe Xie, Zhaoxi Chen, Fangzhou Hong, Ziwei Liu

arXiv Preprint, 2024

GaussianCity is a framework for efficient unbounded 3D city generation using 3D Gaussian Splatting.

citydreamer.gif

CityDreamer: Compositional Generative Model of Unbounded 3D Cities

Haozhe Xie, Zhaoxi Chen, Fangzhou Hong, Ziwei Liu

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024

Unbouned 3D cities generated from 2D image collections!

egolm.png

EgoLM: Multi-Modal Language Model of Egocentric Motions

arXiv Preprint, 2024

EgoLM is a language model-based framework that tracks and understands egocentric motions from multi-modal inputs, i.e., egocentric videos and sparse motion sensors.

hmd2.png

HMD2: Environment-aware Motion Generation from Single Egocentric Head-Mounted Device

arXiv Preprint, 2024

We propose HMD2, the first system for the online generation of full-body self-motion using a single head-mounted device (e.g. Project Aria Glasses) equipped with an outward-facing camera in complex and diverse environments.

nymeria.gif

Nymeria: A Massive Collection of Multimodal Egocentric Daily Motion in the Wild

Lingni Ma, Yuting Ye, Fangzhou Hong, Vladimir Guzov, Yifeng Jiang, Rowan Postyeni, Luis Pesqueira, Alexander Gamino, Vijay Baiyya, Hyo Jin Kim, Kevin Bailey, David S. Fosas, C. Karen Liu, Ziwei Liu, Jakob Engel, Renzo De Nardi, Richard Newcombe

European Conference on Computer Vision (ECCV), 2024

A large-scale, diverse, richly annotated human motion dataset collected in the wild with multi-modal egocentric devices.

LLM.png

Large Motion Model for Unified Multi-Modal Motion Generation

European Conference on Computer Vision (ECCV), 2024

Large Motion Model (LMM) is a motion-centric, multi-modal framework that unifies mainstream motion generation tasks into a generalist model.

MotionDiffusion.png

MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model

Mingyuan Zhang*, Zhongang Cai*, Liang Pan, Fangzhou Hong, Xinying Guo, Lei Yang, Ziwei Liu

Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023

The first diffusion-model-based text-driven motion generation framework with probabilistic mapping, realistic synthesis and multi-level manipulation ability.

ReMoDiffuse.gif

ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model

International Conference on Computer Vision (ICCV), 2023

ReMoDiffuse is a diffusion-model-based motion generation framework that integrates a retrieval mechanism to refine the denoising process, which enhances the generalizability and diversity.

humman.png

HuMMan: Multi-Modal 4D Human Dataset for Versatile Sensing and Modeling

Zhongang Cai*, Daxuan Ren*, Ailing Zeng*, Zhengyu Lin*, Tao Yu*, Wenjia Wang*, Xiangyu Fan, Yang Gao, Yifan Yu, Liang Pan, Fangzhou Hong, Mingyuan Zhang, Chen Change Loy, Lei Yang, Ziwei Liu

European Conference on Computer Vision (ECCV), 2022 (Oral)

A large-scale multi-modal (color images, point clouds, keypoints, SMPL parameters, and textured meshes) 4D human dataset with 1000 human subjects, 400k sequences and 60M frames.

avatarclip.png

AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars

Fangzhou Hong*, Mingyuan Zhang*, Liang Pan, Zhongang Cai, Lei Yang, Ziwei Liu

ACM Transactions on Graphics (SIGGRAPH), 2022

AvatarCLIP empowers layman users to customize a 3D avatar with the desired shape and texture, and drive the avatar with the described motions using solely natural languages.

pointhps.gif

PointHPS: Cascaded 3D Human Pose and Shape Estimation from Point Clouds

Zhongang Cai*, Liang Pan*, Chen Wei, Wanqi Yin, Fangzhou Hong, Mingyuan Zhang, Chen Change Loy, Lei Yang, Ziwei Liu

arXiv Preprint, 2023

SMPL reconstruction from real depth sensor, which are partial point cloud inputs.

surmo.gif

SurMo: Surface-based 4D Motion Modeling for Dynamic Human Rendering

Tao Hu, Fangzhou Hong, Ziwei Liu

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024

Dynamic human rendering with the joint modeling of motion dynamics and appearance.

SHERF_crop.png

SHERF: Generalizable Human NeRF from a Single Image

Shoukang Hu*, Fangzhou Hong*, Liang Pan, Haiyi Mei, Lei Yang, Ziwei Liu

International Conference on Computer Vision (ICCV), 2023

Reconstruct human NeRF from a single image in one forward pass!

garment4d.png

Garment4D: Garment Reconstruction from Point Cloud Sequences

Fangzhou Hong, Liang Pan, Zhongang Cai, Ziwei Liu

35th Conference on Neural Information Processing Systems (NeurIPS), 2021

The first attempt at separable and interpretable garment reconstruction from point cloud sequences, especially challenging loose garments.

egolm.png

EgoLM: Multi-Modal Language Model of Egocentric Motions

arXiv Preprint, 2024

EgoLM is a language model-based framework that tracks and understands egocentric motions from multi-modal inputs, i.e., egocentric videos and sparse motion sensors.

4D-DS-Net.png

Unified 3D and 4D Panoptic Segmentation via Dynamic Shifting Networks

Fangzhou Hong, Lingdong Kong, Hui Zhou, Xinge Zhu, Hongsheng Li, Ziwei Liu

Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023

Extension of the CVPR21 Version; Extend DS-Net to 4D panoptic LiDAR segmentation by the temporally unified instance clustering on aligned LiDAR frames.

4DPSG.png

4D Panoptic Scene Graph Generation

Jingkang Yang, Jun Cen, Wenxuan Peng, Shuai Liu, Fangzhou Hong, Xiangtai Li, Kaiyang Zhou, Qifeng Chen, Ziwei Liu

Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS), 2023 (Spotlight)

To allow artificial intelligence to develop a comprehensive understanding of a 4D world, we introduce 4D Panoptic Scene Graph (PSG-4D), a new representation that bridges the raw visual data perceived in a dynamic 4D world and high-level visual understanding.

hcmoco.png

Versatile Multi-Modal Pre-Training for Human-Centric Perception

Fangzhou Hong, Liang Pan, Zhongang Cai, Ziwei Liu

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022 (Oral)

The first to leverage the multi-modal nature of human data (e.g. RGB, depth, 2D key-points) for effective human-centric representation learning.

dsnet.png

LiDAR-based Panoptic Segmentation via Dynamic Shifting Network

Fangzhou Hong, Hui Zhou, Xinge Zhu, Hongsheng Li, Ziwei Liu

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021

Rank 1st in the public leaderboard of SemanticKITTI panoptic segmentation (2020-11-16); A learnable clustering module is designed to adapt kernel functions to complex point distributions.

cylinder3d_extend.png

Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR-based Perception

Xinge Zhu, Hui Zhou, Tai Wang, Fangzhou Hong, Wei Li, Yuexin Ma, Hongsheng Li, Ruigang Yang, Dahua Lin

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021

Journal Extension of the CVPR21 version; Extend the cylindrical convolution to more general LiDAR-based perception tasks.

cylinder3d.png

Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation

Xinge Zhu*, Hui Zhou*, Fangzhou Hong, Yuexin Ma, Wei Li, Hongsheng Li, Dahua Lin

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021 (Oral)

Rank 1st in the public leaderboard of SemanticKITTI semantic segmentation (2020-11-16); Cylindrical 3D convolution is designed to explore the 3D geometric pattern of LiDAR point clouds.

lrcnet.png

LRC-Net: Learning Discriminative Features on Point Clouds by Encoding Local Region Contexts

Xinhai Liu, Zhizhong Han, Fangzhou Hong, Yu-Shen Liu, Matthias Zwicker

Computer Aided Geometric Design, 2020, 79: 101859. (SCI, 2017 Impact factor: 1.421, CCF B)

To learn discriminative features on point clouds by encoding the fine-grained contexts inside and among local regions simultaneously.

Education & Experiences
MMLab, Nanyang Technological University
Singapore
Jan. 2021 - Present
Ph.D. Student
Surreal, Reality Labs Research, Meta
Redmond, US
Aug. 2023 - Jan. 2024
Research Scientist Intern
MMLab, The Chinese University of Hong Kong
Hong Kong, China
Jul. 2020 - Dec. 2020
Research Assistant
SenseTime Group Limited
Beijing, China
Feb. 2019 - Dec. 2019
Research Intern
Tsinghua University
Beijing, China
Aug. 2016 - Jun. 2020
Bachelor Degree in Software Engineering
High GPA 3.93/4.0. Ranking 1/84.
Awards & Scholarships

ECCV 2024 Outstanding Reviewer

2024

Google PhD Fellowship 2021

2021

Outstanding Undergraduate Thesis of Tsinghua University

2020

Outstanding Graduate of Tsinghua University

2020

Outstanding Graduate of Beijing

2020

Outstanding Graduate of School of Software, Tsinghua University

2020

ICBC Scholarship (Top 3%)

2019

Hua Wei Scholarship (Top 1%)

2018

Tung OOCL Scholarship (Top 5%)

2017
Invited Talks
[2024 @ Stanford SVL]

From High-Fidelity 3D Generative Models to Dynamic Embodied Learning

[2024 @ CVPR 2024 Workshop on EgoMotion]
Academic Services

Conference Reviewer: CVPR’21/23/24, ICCV’23, ECCV’24, NeurIPS’22/23/24, ICML’23/24, ICLR’24, SIGGRAPH’23/24, SIGGRAPH Asia’23/24, AAAI’21/23, 3DV’24

Journal Reviewer: TPAMI, IJCV, TVCG, TCSVT, JABES, PR

Teaching
[2022]

NTU CE/CZ1115 Introduction to Data Science and Artificial Intelligence (Teaching Assistant)

[2022]

NTU CE2003 Digital System Design (Teaching Assistant)

[2021]

NTU CE/CZ1115 Introduction to Data Science and Artificial Intelligence (Teaching Assistant)

[2021]

NTU SC1013 Physics for Computing (Teaching Assistant)