🏆 Challenge#

As a part of the C3DV CVPR 2023 workshop, we are organizing a modelling challenge based on 3DCoMPaT++.

The dataset for this challenge consists of the first 10 compositions of the 3DCoMPaT++ dataset, for both 2D and 3D. The goal of the challenge is to jointly recognize part-material pairs appearing in a shape, ground them, while also recognizing the shape using a jointly trained model. Participants are evaluated using the Grounded Compositional Recognition (GCR) metrics in 3D space. The ground truth data for the challenge is an HDF5 pointcloud test set and 2D WebDataset shards for 10 compositions, which both have been stripped of all ground truth information.

Hint

More information on the rules and format of the challenge can be found at:

Metrics#

Shape accuracy: Accuracy of prediction the shape category. We consider top-1 predictions.
Value: Accuracy of predicting both part category and material of a given part correctly.
Value-all: Accuracy of predicting all the <part, material> pairs of a shape correctly.
Grounded-value: Accuracy of predicting both part category and the material of a given part as well as correctly grounding it.
Grounded-value-all: Accuracy of predicting all the <part, material> pairs of a given shape correctly and grounding all of them correctly.

Note

Value and Grounded-value are evaluated at shape level: we divide the number of correctly identified (resp. grounded) part-material pairs by the total number of parts appearing in each shape, and then average across all samples. Value is thus upper bounded by Value-all, and Grounded-value by Grounded-value-all.

Submission#

Format#

To participate in the challenge, you must submit a file in HDF5 format containing the following information:

Classification prediction for each 3D shape with size: \(\text{n-shapes}\) as a uint8.
Part predictions for each point with dimensions:

\(\ \ \ \ \ \ \Rightarrow \{\text{n-shapes} \times \text{n-points}\}\), in int16 precision.

Material predictions for each point with dimensions:

\(\ \ \ \ \ \ \Rightarrow \{\text{n-shapes} \times \text{n-points}\}\), in uint8 precision.

Groupings predicted part-mat pairs. The index \(k\) of this array provides the pair of part-material labels for group \(k\). The maximum possible number of groups (part-material pairs predicted) is 30, making the expected dimensions:

\(\ \ \ \ \ \ \Rightarrow \{\text{n-shapes} \times \text{max-groups} \times 2\}\), in int16 precision.

Hint

Only group indices that appear in the point_grouping array are accessed in the part_mat_pairs at evaluation. You can simply pad the remaining predictions with \(-1\) if you have less than max-groups predicted pairs.

If point_grouping and part_mat_pairs are provided as an array of all \(-1\), the part-material predictions will be evaluated automatically by maximum voting over group indices.

Groupings labels for each point. Each point is associated with a group index.

\(\ \ \ \ \ \ \Rightarrow \{\text{n-shapes} \times \text{n-points} \}\), in int16 precision.

The HDF5 schema can be schematized as follows:

{
    "shape_preds":    array(shape=[n_shapes],                  type=uint8)
    "part_labels":    array(shape=[n_shapes * n_points],       type=int16)
    "mat_labels":     array(shape=[n_shapes * n_points],       type=uint8)
    "part_mat_pairs": array(shape=[n_shapes * max_groups * 2], type=int16)
    "point_grouping": array(shape=[n_shapes * n_points],       type=uint8)
}

Writing an HDF5 submission file#

We show here how to easily write to an HDF5 file following our submission format. First import the h5py package, and the HDF5 dataloader to be used on the test set sampled points.

"""
Writing an HDF5 submission following the challenge format.
"""
import h5py
import tqdm

from my_cool_model import MyModel

N_POINTS = 2048
MAX_GROUPS = 30

Instantiate the dataloader and initialize the HDF5 schema for a submission.

# Instantiating the dataloader
data_loader = ... # You must use pre-extracted 3D pointclouds
                  # for evaluation on the test or validation sets.
                  # Samples and predictions must be read and written in sequence,
                  # without any shuffling.
n_shapes = len(data_loader)

# If the file exists, open it instead
hdf5_name = get_hdf5_name(hdf5_dir, hdf5_name="my_submission")

# Creating the selected split
train_hdf5 = open_hdf5(hdf5_name, mode='w')
train_hdf5.create_dataset('shape_preds',
                            shape=(n_shapes),
                            dtype='uint8')
train_hdf5.create_dataset('part_labels',
                            shape=(n_shapes, N_POINTS),
                            dtype='int16')
train_hdf5.create_dataset('mat_labels',
                            shape=(n_shapes, N_POINTS),
                            dtype='uint8')
train_hdf5.create_dataset('part_mat_pairs',
                            shape=(n_shapes, MAX_GROUPS, 2),
                            dtype='int16')
train_hdf5.create_dataset('point_grouping',
                            shape=(n_shapes, N_POINTS),
                            dtype='uint8')

Iterate in sequence and extract predictions using the method you’ve designed.

Warning

The predictions file needs to be generated in sequence following the order of samples in the unshuffled HDF5 test loader, to match with the server’s ground truth data.

# Iterating over the test set
for k in tqdm.tqdm(range(n_shapes)):
    shape_id, style_id, pointcloud = data_loader[k]

    # Forward through your model
    shape_preds, point_part_labels, point_mat_labels, part_mat_pairs, point_grouping = \
        MyModel(shape_id, style_id, pointcloud)

    # If you don't want to predict point groupings/part material pairs yourself,
    # you can simply fill both matrices with -1

    # Write the entries
    train_hdf5['shape_preds'][k]    = shape_preds
    train_hdf5['part_labels'][k]    = point_part_labels
    train_hdf5['mat_labels'][k]     = point_mat_labels
    train_hdf5['part_mat_pairs'][k] = part_mat_pairs
    train_hdf5['point_grouping'][k] = point_grouping

# Close the HDF5 file
train_hdf5.close()

Submitting your predictions#

To submit your predictions, please use our eval.ai challenge platform:

Baselines#

We share here some indicative baseline numbers for the GCR task, with two methods:

PointNeXT + GT Material: We use two pre-trained PointNeXT models: one for shape classification and one for part segmentation. We use the ground-truth material for evaluaiton.
🦖 “Godzilla” model: This baseline employs different separate models and only combines predictions at evaluation. We use:
- PointNeXT for 3D shape classification
- SegFormer for 2D material segmentation and 2D part segmentation
2D dense predictions are then projected to the 3D space using the depth maps and camera parameters (if you’re curious about how to do that, check out “Linking 2D and 3D”).

Results are provided below for coarse and fine segmentation levels.

Coarse-grained results.

Model	Accuracy	Value	Value-all	Grounded-value	Grounded-value-all
PointNeXT + GTMaterial	84.27	73.49	61.08	62.93	40.84
🦖 Godzilla	84.27	65.69	44.82	52.82	29.74

Fine-grained results.

Model	Accuracy	Value	Value-all	Grounded-value	Grounded-value-all
PointNeXT + GTMaterial	84.18	65.25	32.56	49.30	13.58
🦖 Godzilla	84.18	42.37	9.09	26.68	3.83

While the bruteforce “Godzilla” approach yields decent results, it should be noted that it is not in the spirit of the challenge as leveraging links between modalities will surely enable better GCR performance.