🏆 Challenge#
As a part of the C3DV CVPR 2023 workshop, we are organizing a modelling challenge based on 3DCoMPaT++.
The dataset for this challenge consists of the first 10
compositions of the 3DCoMPaT++ dataset, for both 2D and 3D.
The goal of the challenge is to jointly recognize part-material pairs appearing in a shape, ground them, while also recognizing the shape using a jointly trained model.
Participants are evaluated using the Grounded Compositional Recognition (GCR) metrics in 3D space.
The ground truth data for the challenge is an HDF5 pointcloud test set and 2D WebDataset shards for 10
compositions, which both have been stripped of all ground truth information.
Metrics#
Shape accuracy: Accuracy of prediction the shape category. We consider top-1 predictions.
Value: Accuracy of predicting both part category and material of a given part correctly.
Value-all: Accuracy of predicting all the
<part, material>
pairs of a shape correctly.Grounded-value: Accuracy of predicting both part category and the material of a given part as well as correctly grounding it.
Grounded-value-all: Accuracy of predicting all the
<part, material>
pairs of a given shape correctly and grounding all of them correctly.
Note
Value and Grounded-value are evaluated at shape level: we divide the number of correctly identified (resp. grounded) part-material pairs by the total number of parts appearing in each shape, and then average across all samples. Value is thus upper bounded by Value-all, and Grounded-value by Grounded-value-all.
Submission#
Format#
To participate in the challenge, you must submit a file in HDF5 format containing the following information:
Classification prediction for each 3D shape with size: \(\text{n-shapes}\) as a
uint8
.Part predictions for each point with dimensions:
\(\ \ \ \ \ \ \Rightarrow \{\text{n-shapes} \times \text{n-points}\}\), in int16
precision.
Material predictions for each point with dimensions:
\(\ \ \ \ \ \ \Rightarrow \{\text{n-shapes} \times \text{n-points}\}\), in uint8
precision.
Groupings predicted part-mat pairs. The index \(k\) of this array provides the pair of part-material labels for group \(k\). The maximum possible number of groups (part-material pairs predicted) is
30
, making the expected dimensions:
\(\ \ \ \ \ \ \Rightarrow \{\text{n-shapes} \times \text{max-groups} \times 2\}\), in int16
precision.
Hint
Only group indices that appear in the point_grouping
array are accessed in the part_mat_pairs
at evaluation.
You can simply pad the remaining predictions with \(-1\) if you have less than max-groups
predicted pairs.
If point_grouping
and part_mat_pairs
are provided as an array of all \(-1\), the part-material predictions will be evaluated automatically by maximum voting over group indices.
Groupings labels for each point. Each point is associated with a group index.
\(\ \ \ \ \ \ \Rightarrow \{\text{n-shapes} \times \text{n-points} \}\), in int16
precision.
The HDF5 schema can be schematized as follows:
{
"shape_preds": array(shape=[n_shapes], type=uint8)
"part_labels": array(shape=[n_shapes * n_points], type=int16)
"mat_labels": array(shape=[n_shapes * n_points], type=uint8)
"part_mat_pairs": array(shape=[n_shapes * max_groups * 2], type=int16)
"point_grouping": array(shape=[n_shapes * n_points], type=uint8)
}
Writing an HDF5 submission file#
We show here how to easily write to an HDF5 file following our submission format.
First import the h5py
package, and the HDF5 dataloader to be used on the test set sampled points.
"""
Writing an HDF5 submission following the challenge format.
"""
import h5py
import tqdm
from my_cool_model import MyModel
N_POINTS = 2048
MAX_GROUPS = 30
Instantiate the dataloader and initialize the HDF5 schema for a submission.
# Instantiating the dataloader
data_loader = ... # You must use pre-extracted 3D pointclouds
# for evaluation on the test or validation sets.
# Samples and predictions must be read and written in sequence,
# without any shuffling.
n_shapes = len(data_loader)
# If the file exists, open it instead
hdf5_name = get_hdf5_name(hdf5_dir, hdf5_name="my_submission")
# Creating the selected split
train_hdf5 = open_hdf5(hdf5_name, mode='w')
train_hdf5.create_dataset('shape_preds',
shape=(n_shapes),
dtype='uint8')
train_hdf5.create_dataset('part_labels',
shape=(n_shapes, N_POINTS),
dtype='int16')
train_hdf5.create_dataset('mat_labels',
shape=(n_shapes, N_POINTS),
dtype='uint8')
train_hdf5.create_dataset('part_mat_pairs',
shape=(n_shapes, MAX_GROUPS, 2),
dtype='int16')
train_hdf5.create_dataset('point_grouping',
shape=(n_shapes, N_POINTS),
dtype='uint8')
Iterate in sequence and extract predictions using the method you’ve designed.
Warning
The predictions file needs to be generated in sequence following the order of samples in the unshuffled HDF5 test loader, to match with the server’s ground truth data.
# Iterating over the test set
for k in tqdm.tqdm(range(n_shapes)):
shape_id, style_id, pointcloud = data_loader[k]
# Forward through your model
shape_preds, point_part_labels, point_mat_labels, part_mat_pairs, point_grouping = \
MyModel(shape_id, style_id, pointcloud)
# If you don't want to predict point groupings/part material pairs yourself,
# you can simply fill both matrices with -1
# Write the entries
train_hdf5['shape_preds'][k] = shape_preds
train_hdf5['part_labels'][k] = point_part_labels
train_hdf5['mat_labels'][k] = point_mat_labels
train_hdf5['part_mat_pairs'][k] = part_mat_pairs
train_hdf5['point_grouping'][k] = point_grouping
# Close the HDF5 file
train_hdf5.close()
Submitting your predictions#
To submit your predictions, please use our eval.ai challenge platform:
Baselines#
We share here some indicative baseline numbers for the GCR task, with two methods:
PointNeXT + GT Material: We use two pre-trained
PointNeXT
models: one for shape classification and one for part segmentation. We use the ground-truth material for evaluaiton.🦖 “Godzilla” model: This baseline employs different separate models and only combines predictions at evaluation. We use:
PointNeXT
for 3D shape classificationSegFormer
for 2D material segmentation and 2D part segmentation
2D dense predictions are then projected to the 3D space using the depth maps and camera parameters (if you’re curious about how to do that, check out “Linking 2D and 3D”).
Results are provided below for coarse
and fine
segmentation levels.
Coarse-grained results.
Model |
Accuracy |
Value |
Value-all |
Grounded-value |
Grounded-value-all |
---|---|---|---|---|---|
PointNeXT + GTMaterial |
84.27 |
73.49 |
61.08 |
62.93 |
40.84 |
🦖 Godzilla |
84.27 |
65.69 |
44.82 |
52.82 |
29.74 |
Fine-grained results.
Model |
Accuracy |
Value |
Value-all |
Grounded-value |
Grounded-value-all |
---|---|---|---|---|---|
PointNeXT + GTMaterial |
84.18 |
65.25 |
32.56 |
49.30 |
13.58 |
🦖 Godzilla |
84.18 |
42.37 |
9.09 |
26.68 |
3.83 |
While the bruteforce “Godzilla” approach yields decent results, it should be noted that it is not in the spirit of the challenge as leveraging links between modalities will surely enable better GCR performance.