Pr. Vedaldi is a Professor of Computer Vision and Machine
Learning and a co-lead of the VGG group at the Engineering
Science department of the University of Oxford. His
research mainly covers using computer vision and machine
learning methods to understand the content of images and
videos automatically, with little to no manual
supervision, in terms of semantics and 3D geometry. He is
also the leading author of the VLFeat and MatConvNet
computer vision and deep learning libraries.
Vladimir (Vova) Kim is a Senior Research Scientist at
Adobe Research, where his work focuses on computer vision,
machine learning, and 3D geometry processing. His research
spans a wide range of topics, including 3D reconstruction,
shape analysis, and generative models for 3D content
creation. Vova is well-known for his contributions to
developing advanced algorithms that bridge the gap between
visual understanding and 3D modeling, helping to shape the
future of creative technologies. He holds a Ph.D. from
Princeton University, and his work has been widely
published in top-tier conferences such as CVPR and
SIGGRAPH.
Pr. Birdal is an Assistant Professor (Lecturer) in the
Department of Computing of Imperial College London.
Previously, he was a senior Postdoctoral Research Fellow
at Stanford University within the Geometric Computing
Group of Pr. Leonidas Guibas. His current foci of interest
involve geometric machine learning and 3D computer vision.
More theoretical work is aimed at investigating and
interrogating limits in geometric computing and
non-Euclidean inference as well as principles of deep
learning.
Pr. Su is an Associate Professor in the Department of
Computer Science and Engineering at the University of
California, San Diego. His research focuses on artificial
intelligence, 3D vision, and robotics, with a particular
emphasis on deep learning for 3D understanding, 3D
reconstruction, and robot learning. He has made
significant contributions to the development of neural
representations for 3D data, advancing fields such as 3D
shape analysis and scene understanding.
Pr. Chang is an Associate Professor in the School of
Computing Science at Simon Fraser University. Prior to
this, she was a visiting research scientist at Facebook AI
Research and a research scientist at Eloquent Labs. Her
research focuses on bridging the gap between language and
3D representations of shapes and scenes, grounding
language for embodied agents, and synthesizing 3D
environments from natural language.
Pr. Fang is an Associate Professor of Electrical and Computer Engineering at the NYU Abu Dhabi and
NYU Tandon.
He directs the NYU Multimedia and Visual Computing Lab.
His research focuses on 3D Computer Vision and Machine Learning with applications to robotics and
autonomous driving. He is currently working on the development of 3D deep learning technologies in
large-scale visual computing, cross-domain and cross-modality models, and their various industrial
applications.
GCR is a 3D vision task for recognizing material-part compositions on 3D objects using the 3DCoMPaT dataset.
We offer two variants
GCR-Coarse
and
GCR-Fine
with different segmentation granularity, plus a Language-Based Part Grounding challenge where models segment
parts from text prompts.
Evaluation uses metrics including Shape Accuracy and Grounded-value-all. Both challenges run February-May
2025, with results announced in June.
We encourage participation in both tracks.
📊 Dataset
The 3DCoMPaT dataset for both challenge tracks is available through our download page.
Submission Limit: Each participant is allowed to submit
their solution a maximum of three times per day.
Data Usage: Participants are not permitted to use any
data other than the 3DCoMPaT data for training their models.
Technical Report: Each participant must submit a
technical report detailing their methods, which will be made public, in order to be
eligible for any prizes or rewards.
🏆 Awards
Total prize pool: 1500$. Teams are encouraged to particpate to both challenge tracks.
Fine track:
1st: 500$
2nd: 250$
Coarse track:
1st: 500$
2nd: 250$
These prizes are designed to motivate participants to put their best effort into the challenge
and to reward those who perform exceptionally well. The challenge organizers hope that these
prizes will encourage a high level of participation and help to drive innovation in the field of
3D computer vision.
It should be noted that eligibility for these prizes is contingent on participants adhering to
the rules of the challenge. Therefore, participants must submit their solutions in accordance
with the rules and provide a technical report detailing their methods to be considered for any
prizes or rewards.
💬 Q&A
If you encounter any technical issue related to the challenge, or if you're missing critical
information, please open a ticket on our GitHub
repository.
🎉 2023 Winning Solution
We share below the previous year's solution winner, and her winning solution repository
below:
Challenges
3DCoMPaT-200 Challenges
3DCoMPaT-200 Part Grounding Challenge
Grounded CoMPaT Recognition (GCR). Given an input shape, here: a chair, the task
consists of (a) recognizing the shape category and (b) segmenting the part-material pairs composing it.
The Grounded CoMPaT Recognition (GCR) is a compositional 3D Vision task that aims to collectively
recognize and ground compositions of materials on parts of 3D objects. We will organize two variations of
this task and adapt state-of-art multiview 2D and 3D deep learning methods to solve the problem. A
documentation describing the 3DCoMPaT-200 dataset and the GCR task can be found here.
Evaluation
Inspired by the metrics proposed in [Yatskar2016, Pratt2016] for compositional situation recognition of
activities in images, we define the compositional metrics of the 2D/3D Grounded CoMPaT Recognition (GCR)
task as follows:
(a) Shape Accuracy: accuracy of the predicted shape category.
(b) Value: accuracy of predicting both part category and the material
of a given part correctly.
(c) Value-all: accuracy of predicting all the (part, material) pairs of
a shape correctly.
(d) Grounded-value: accuracy of predicting both part category and the
material of a given part as well as correctly grounding it.
(e) Grounded-value-all: accuracy of predicting all the (part, material)
pairs of a given shape correctly and grounding all of them correctly.
All these metrics are calculated for each shape and then averaged across them to avoid bias toward shapes
with more parts. Given the shape dependence of metrics, we define three settings:
(a) Ground Truth Shape: the ground truth shape is assumed to be
correct.
(b) Top-1 Shape: Shape category is predicted correctly.
(c) Top-5 Shape: Shape category is in the top-5 predictions.
For (b) and (c), part-material pairs and their groundings are considered incorrect if the shape is not in
top-1 or top-5 predictions, respectively.
Challenge timeline
We propose the following tentative timeline for the 3DCoMPaT-200 challenge:
Start: March 1, 2025
Submission Deadline: May 30, 2025
Decision: June 12, 2025
3DCoMPaT-200 Language-Based Part Grounding
Challenge description
Task: Part Grounding: Given text prompts referring to one or more parts in a shape, participants
will design a model to segment the mentioned parts in the shape's point cloud. The challenge offers
various levels of difficulty, with participants having access to grounding prompts with different numbers
of parts per shape.
Challenge timeline
We propose the following tentative timeline for the Language-Based Part Grounding challenge:
Start: March 1, 2025
Submission Deadline: May 30, 2025
Decision: June 12, 2025
Paper submission
Call for Papers
🎯 Paper Submissions
We invite researchers to submit their work on compositional 3D vision for the C3DV workshop.
Selected papers will be presented during the workshop in poster and oral sessions.
More information on paper submission and presentation will be announced soon.
🦜 Topics
Besides the 3DCoMPaT and 3DCoMPaT-200 challenges, the C3DV workshop also accepts papers in relation with
compositional 3D vision.
The workshop will include a poster and an oral session for related works.
Topics of this workshop include but are not limited to:
Deep learning methods for compositional 3D vision
Self-supervised learning for compositional 3D vision
Visual relationship detection in 3D scenes
Zero-shot recognition/detection of compositional 3D visual
concepts
Novel problems in 3D vision and compositionality
Text/composition to 3D generation
Text/composition-based editing of 3D scenes/objects
Language-guided 3D visual understanding (objects, relationships,
...)
Transfer learning for compositional 3D Vision
Multimodal pre-training for 3D understanding
Composition-based 3D object/scene search/retrieval
Compositional 3D vision aiding language problems
...
The submitted 4-page abstracts will be peer-reviewed in CVPR format. Abstracts will be presented
in the workshop poster session,
and a portion of the accepted papers will be orally presented.
📨 Submission
Paper submissions will be handled with CMT through the following link:
(available soon.)
Please select the appropriate track (archival or non-archival) and check for the relevant timelines in the dates section.