C3DV Workshop 2025

Program

Workshop Program

Join us for a day of exciting talks and discussions on the latest advances in computer vision and AI.

08:00

Opening Remarks08:00 AM - 08:15 AM

Opening Remarks

Organizers

08:15

Invited Talk 108:15 AM - 08:45 AM

Compositional 3D Generation, Large and Small

Andrea Vedaldi

Bio

Pr. Vedaldi is a Professor of Computer Vision and Machine Learning and a co-lead of the VGG group at the Engineering Science department of the University of Oxford. His research mainly covers using computer vision and machine learning methods to understand the content of images and videos automatically, with little to no manual supervision, in terms of semantics and 3D geometry. He is also the leading author of the VLFeat and MatConvNet computer vision and deep learning libraries.

08:55

Invited Talk 208:55 AM - 09:25 AM

Controllable 3D Generation

Vladimir (Vova) Kim

Bio

Vladimir (Vova) Kim is a Senior Research Scientist at Adobe Research, where his work focuses on computer vision, machine learning, and 3D geometry processing. His research spans a wide range of topics, including 3D reconstruction, shape analysis, and generative models for 3D content creation. Vova is well-known for his contributions to developing advanced algorithms that bridge the gap between visual understanding and 3D modeling, helping to shape the future of creative technologies. He holds a Ph.D. from Princeton University, and his work has been widely published in top-tier conferences such as CVPR and SIGGRAPH.

09:25

Challenge Presentation09:25 AM - 09:45 AM

3DCoMPaT-200 Challenge Presentation

09:45

Coffee Break09:45 AM - 10:05 AM

Coffee Break & Networking

10:05

Invited Talk 410:05 AM - 10:35 AM

Capturing Reality End-to-End: Fast, Texture-Ready Digital Twins with Dynamics and Articulation

Hao Su

Bio

Pr. Su is an Associate Professor in the Department of Computer Science and Engineering at the University of California, San Diego. His research focuses on artificial intelligence, 3D vision, and robotics, with a particular emphasis on deep learning for 3D understanding, 3D reconstruction, and robot learning. He has made significant contributions to the development of neural representations for 3D data, advancing fields such as 3D shape analysis and scene understanding.

10:35

Invited Talk 510:35 AM - 11:05 AM

Angel Chang Talk

Angel Chang

Bio

Pr. Chang is an Associate Professor in the School of Computing Science at Simon Fraser University. Prior to this, she was a visiting research scientist at Facebook AI Research and a research scientist at Eloquent Labs. Her research focuses on bridging the gap between language and 3D representations of shapes and scenes, grounding language for embodied agents, and synthesizing 3D environments from natural language.

11:05

Invited Talk 611:05 AM - 11:35 AM

Tolga Birdal Talk

Tolga Birdal

Bio

Pr. Birdal is an Assistant Professor (Lecturer) in the Department of Computing of Imperial College London. Previously, he was a senior Postdoctoral Research Fellow at Stanford University within the Geometric Computing Group of Pr. Leonidas Guibas. His current foci of interest involve geometric machine learning and 3D computer vision. More theoretical work is aimed at investigating and interrogating limits in geometric computing and non-Euclidean inference as well as principles of deep learning.

12:15

Invited Talk 711:45 PM - 12:15 PM

Towards Robust Embodied Navigation: Challenges and Solutions

Yi Fang

Bio

Pr. Fang is an Associate Professor of Electrical and Computer Engineering at the NYU Abu Dhabi and NYU Tandon. He directs the NYU Multimedia and Visual Computing Lab. His research focuses on 3D Computer Vision and Machine Learning with applications to robotics and autonomous driving. He is currently working on the development of 3D deep learning technologies in large-scale visual computing, cross-domain and cross-modality models, and their various industrial applications.

12:45

Closing Remarks12:15 PM - 12:20 PM

Closing Remarks

Organizers

Challenges

3DCoMPaT-200 Challenge

🔍 Challenge overview

GCR is a 3D vision task for recognizing material-part compositions on 3D objects using the 3DCoMPaT dataset. We offer two variants GCR-Coarse and GCR-Fine with different segmentation granularity, plus a Language-Based Part Grounding challenge where models segment parts from text prompts.
Evaluation uses metrics including Shape Accuracy and Grounded-value-all. Both challenges run February-May 2025, with results announced in June.

We encourage participation in both tracks.

📊 Dataset

The 3DCoMPaT dataset for both challenge tracks is available through our download page.

📨 Submission

Submission will be made through the eval-ai platform, for the Part-Segmentation Challenge and Language Grounding Challenge.

📜 Rules

Here are the rules for the challenge:

Submission Limit: Each participant is allowed to submit their solution a maximum of three times per day.
Data Usage: Participants are not permitted to use any data other than the 3DCoMPaT data for training their models.
Technical Report: Each participant must submit a technical report detailing their methods, which will be made public, in order to be eligible for any prizes or rewards.

🏆 Awards

Total prize pool: 1500$. Teams are encouraged to particpate to both challenge tracks.

Fine track:

1st: 500$
2nd: 250$

Coarse track:

1st: 500$
2nd: 250$

These prizes are designed to motivate participants to put their best effort into the challenge and to reward those who perform exceptionally well. The challenge organizers hope that these prizes will encourage a high level of participation and help to drive innovation in the field of 3D computer vision. It should be noted that eligibility for these prizes is contingent on participants adhering to the rules of the challenge. Therefore, participants must submit their solutions in accordance with the rules and provide a technical report detailing their methods to be considered for any prizes or rewards.

💬 Q&A

If you encounter any technical issue related to the challenge, or if you're missing critical information, please open a ticket on our GitHub repository.

🎉 2023 Winning Solution

We share below the previous year's solution winner, and her winning solution repository below:

Challenges

3DCoMPaT-200 Challenges

3DCoMPaT-200 Part Grounding Challenge

Grounded CoMPaT Recognition (GCR). Given an input shape, here: a chair, the task consists of (a) recognizing the shape category and (b) segmenting the part-material pairs composing it.

The Grounded CoMPaT Recognition (GCR) is a compositional 3D Vision task that aims to collectively recognize and ground compositions of materials on parts of 3D objects. We will organize two variations of this task and adapt state-of-art multiview 2D and 3D deep learning methods to solve the problem. A documentation describing the 3DCoMPaT-200 dataset and the GCR task can be found here.

Evaluation

Inspired by the metrics proposed in [Yatskar2016, Pratt2016] for compositional situation recognition of activities in images, we define the compositional metrics of the 2D/3D Grounded CoMPaT Recognition (GCR) task as follows:

(a) Shape Accuracy: accuracy of the predicted shape category.
(b) Value: accuracy of predicting both part category and the material of a given part correctly.
(c) Value-all: accuracy of predicting all the (part, material) pairs of a shape correctly.
(d) Grounded-value: accuracy of predicting both part category and the material of a given part as well as correctly grounding it.
(e) Grounded-value-all: accuracy of predicting all the (part, material) pairs of a given shape correctly and grounding all of them correctly.

All these metrics are calculated for each shape and then averaged across them to avoid bias toward shapes with more parts. Given the shape dependence of metrics, we define three settings:

(a) Ground Truth Shape: the ground truth shape is assumed to be correct.
(b) Top-1 Shape: Shape category is predicted correctly.
(c) Top-5 Shape: Shape category is in the top-5 predictions.

For (b) and (c), part-material pairs and their groundings are considered incorrect if the shape is not in top-1 or top-5 predictions, respectively.

Challenge timeline

We propose the following tentative timeline for the 3DCoMPaT-200 challenge:

Start: March 1, 2025
Submission Deadline: May 30, 2025
Decision: June 12, 2025

3DCoMPaT-200 Language-Based Part Grounding

Challenge description

Task: Part Grounding: Given text prompts referring to one or more parts in a shape, participants will design a model to segment the mentioned parts in the shape's point cloud. The challenge offers various levels of difficulty, with participants having access to grounding prompts with different numbers of parts per shape.

Challenge timeline

We propose the following tentative timeline for the Language-Based Part Grounding challenge:

Start: March 1, 2025
Submission Deadline: May 30, 2025
Decision: June 12, 2025

Paper submission

Call for Papers

🎯 Paper Submissions

We invite researchers to submit their work on compositional 3D vision for the C3DV workshop. Selected papers will be presented during the workshop in poster and oral sessions.

More information on paper submission and presentation will be announced soon.

🦜 Topics

Besides the 3DCoMPaT and 3DCoMPaT-200 challenges, the C3DV workshop also accepts papers in relation with compositional 3D vision. The workshop will include a poster and an oral session for related works. Topics of this workshop include but are not limited to:

Deep learning methods for compositional 3D vision
Self-supervised learning for compositional 3D vision
Visual relationship detection in 3D scenes
Zero-shot recognition/detection of compositional 3D visual concepts
Novel problems in 3D vision and compositionality
Text/composition to 3D generation
Text/composition-based editing of 3D scenes/objects
Language-guided 3D visual understanding (objects, relationships, ...)
Transfer learning for compositional 3D Vision
Multimodal pre-training for 3D understanding
Composition-based 3D object/scene search/retrieval
Compositional 3D vision aiding language problems
...

The submitted 4-page abstracts will be peer-reviewed in CVPR format. Abstracts will be presented in the workshop poster session, and a portion of the accepted papers will be orally presented.

📨 Submission

Paper submissions will be handled with CMT through the following link:

Microsoft CMT: C3DVCVPR2025

Please select the appropriate track (archival or non-archival) and check for the relevant timelines in the dates section.

Event	Date
Paper submission deadline	May 25th
Notification to authors	May 30th
Camera-ready deadline	June 5st
Workshop date	June 12th

Event	Date
Release of training/validation data	March. 1st
Test server online	March. 20th
Submission deadline	May 30th
Fact sheets/source code submission deadline	June 10th
Winners announcement	June 14th

CVPR 2025 - Nashville, United States

C3DV: 3rd Workshop on Compositional 3D Vision

June 12th, 2025 - Room 110 B, Music City Center, Nashville, TN, USA

000

Days

00

Hours

00

Minutes

00

Seconds

Program

Workshop Program

08:00

08:15

08:55

09:25

09:45

10:05

10:35

11:05

12:15

12:45

Challenges

3DCoMPaT-200 Challenge

🔍 Challenge overview

📊 Dataset

📨 Submission

📜 Rules

🏆 Awards

💬 Q&A

🎉 2023 Winning Solution

Challenges

3DCoMPaT-200 Challenges

3DCoMPaT-200 Part Grounding Challenge

Evaluation

Challenge timeline

3DCoMPaT-200 Language-Based Part Grounding

Challenge description

Challenge timeline

Paper submission

Call for Papers

🎯 Paper Submissions

🦜 Topics

📨 Submission

Speakers

Invited Speakers

Organizers

Workshop Organizers

Challenge Organizers

Dates

Timeline

Non-archival track:

3DCoMPaT-200 Challenge:

C3DV: 3^rd Workshop on Compositional 3D Vision