Fangzhou Hong is currently a final-year Ph.D. student in the School of Computer Science and Engineering at Nanyang Technological University (MMLab@NTU), supervised by Prof. Ziwei Liu. Previously, he received B.Eng. degree in Software Engineering from Tsinghua University in 2020. His research interests lie on the computer vision and deep learning. Particularly, he is interested in 3D representation learning and its intersection with computer graphics.
Two papers accepted to CVPR 2024.
Two papers accepted to TPAMI (4D-DS-Net and MotionDiffuse).
Two papers accecpted to NeurIPS 2023 (one spotlight, one poster).
We are hosting OmniObject3D challenge.
Three papers accepted to ICCV 2023.
I am recognized as CVPR 2023 Outstanding Reviewer.
One paper (AvatarCLIP) accepted to SIGGRAPH 2022 (journal track).
One paper (Garment4D) accepted to NeurIPS 2021.
I am awarded Google PhD Fellowship 2021 (Machine Perception).
One paper (extended Cylinder3D) accepted by TPAMI.
Two papers (DS-Net and Cylinder3D) accepted to CVPR 2021.
Start my journey in MMLab@NTU!
SurMo: Surface-based 4D Motion Modeling for Dynamic Human Rendering
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024
CityDreamer: Compositional Generative Model of Unbounded 3D Cities
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024
Unbouned 3D cities generated from 2D image collections!
Large-Vocabulary 3D Diffusion Model with Transformer
International Conference on Learning Representations (ICLR), 2024
DiffTF achieves state-of-the-art large-vocabulary 3D object generation performance with 3D-aware transformers.
MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model
Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
The first diffusion-model-based text-driven motion generation framework with probabilistic mapping, realistic synthesis and multi-level manipulation ability.
Unified 3D and 4D Panoptic Segmentation via Dynamic Shifting Networks
Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Extension of the CVPR21 Version; Extend DS-Net to 4D panoptic LiDAR segmentation by the temporally unified instance clustering on aligned LiDAR frames.
PrimDiffusion: Volumetric Primitives Diffusion for 3D Human Generation
Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS), 2023
PrimDiffusion performs the diffusion and denoising process on a set of primitives which compactly represent 3D humans.
4D Panoptic Scene Graph Generation
Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS), 2023 (Spotlight)
To allow artificial intelligence to develop a comprehensive understanding of a 4D world, we introduce 4D Panoptic Scene Graph (PSG-4D), a new representation that bridges the raw visual data perceived in a dynamic 4D world and high-level visual understanding.
SHERF: Generalizable Human NeRF from a Single Image
International Conference on Computer Vision (ICCV), 2023
Reconstruct human NeRF from a single image in one forward pass!
DeformToon3D: Deformable 3D Toonification from Neural Radiance Fields
International Conference on Computer Vision (ICCV), 2023
We learn a style field that deforms real 3D faces to styleized 3D faces.
ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model
International Conference on Computer Vision (ICCV), 2023
ReMoDiffuse is a diffusion-model-based motion generation framework that integrates a retrieval mechanism to refine the denoising process, which enhances the generalizability and diversity.
EVA3D: Compositional 3D Human Generation from 2D Image Collections
International Conference on Learning Representations (ICLR), 2023 (Spotlight)
EVA3D is a high-quality unconditional 3D human generative model that only requires 2D image collections for training.
HuMMan: Multi-Modal 4D Human Dataset for Versatile Sensing and Modeling
European Conference on Computer Vision (ECCV), 2022 (Oral)
A large-scale multi-modal (color images, point clouds, keypoints, SMPL parameters, and textured meshes) 4D human dataset with 1000 human subjects, 400k sequences and 60M frames.
AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars
ACM Transactions on Graphics (SIGGRAPH), 2022
AvatarCLIP empowers layman users to customize a 3D avatar with the desired shape and texture, and drive the avatar with the described motions using solely natural languages.
Versatile Multi-Modal Pre-Training for Human-Centric Perception
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022 (Oral)
The first to leverage the multi-modal nature of human data (e.g. RGB, depth, 2D key-points) for effective human-centric representation learning.
Garment4D: Garment Reconstruction from Point Cloud Sequences
35th Conference on Neural Information Processing Systems (NeurIPS), 2021
The first attempt at separable and interpretable garment reconstruction from point cloud sequences, especially challenging loose garments.
LiDAR-based Panoptic Segmentation via Dynamic Shifting Network
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021
Rank 1st in the public leaderboard of SemanticKITTI panoptic segmentation (2020-11-16); A learnable clustering module is designed to adapt kernel functions to complex point distributions.
Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation / LiDAR-based Perception
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021 (Oral) IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Rank 1st in the public leaderboard of SemanticKITTI semantic segmentation (2020-11-16); Cylindrical 3D convolution is designed to explore the 3D geometric pattern of LiDAR point clouds. Further extend the cylindrical convolution to more general LiDAR-based perception tasks.
LRC-Net: Learning Discriminative Features on Point Clouds by Encoding Local Region Contexts
Computer Aided Geometric Design, 2020, 79: 101859. (SCI, 2017 Impact factor: 1.421, CCF B)
To learn discriminative features on point clouds by encoding the fine-grained contexts inside and among local regions simultaneously.
Large Motion Model for Unified Multi-Modal Motion Generation
arXiv Preprint, 2024
Large Motion Model (LMM) is a motion-centric, multi-modal framework that unifies mainstream motion generation tasks into a generalist model.
StructLDM: Structured Latent Diffusion for 3D Human Generation
arXiv Preprint, 2024
StructLDM is a diffusion-based unconditional 3D human generative model learned from 2D images.
FashionEngine: Interactive Generation and Editing of 3D Clothed Humans
arXiv Preprint, 2024
FashionEngine is an interactive 3D human generation and editing system with multimodal control (e.g., texts, images, hand-drawing sketches).
3DTopia: Large Text-to-3D Generation Model with Hybrid Diffusion Priors
arXiv Preprint, 2024
Text-to-3D Generation within 5 Minutes! A two-stage design, utilizing both 3D difffusion prior and 2D priors.
HumanLiff: Layer-wise 3D Human Generation with Diffusion Model
arXiv Preprint, 2023
We generate 3D digital humans using 3D diffusion model in a controllable, layer-wise way.
PointHPS: Cascaded 3D Human Pose and Shape Estimation from Point Clouds
arXiv Preprint, 2023
SMPL reconstruction from real depth sensor, which are partial point cloud inputs.
Google PhD Fellowship 2021
Outstanding Undergraduate Thesis of Tsinghua University
Outstanding Graduate of Tsinghua University
Outstanding Graduate of Beijing
Outstanding Graduate of School of Software, Tsinghua University
ICBC Scholarship (Top 3%)
Hua Wei Scholarship (Top 1%)
Tung OOCL Scholarship (Top 5%)
Conference Reviewer: CVPR’21/23/24, ICCV’23, NeurIPS’22/23, ICML’23/24, ICLR’24, SIGGRAPH’23, SIGGRAPH Asia’23, AAAI’21/23
Journal Reviewer: TPAMI, IJCV, TCSVT, JABES, PR
NTU CE/CZ1115 Introduction to Data Science and Artificial Intelligence (Teaching Assistant)
NTU CE2003 Digital System Design (Teaching Assistant)
NTU CE/CZ1115 Introduction to Data Science and Artificial Intelligence (Teaching Assistant)
NTU SC1013 Physics for Computing (Teaching Assistant)
Dynamic human rendering with the joint modeling of motion dynamics and appearance.