Our task is unpaired shape-to-shape translation across domains for 3D point clouds. Inspired by the design of LOGAN [61], we established an autoencoder and multiple translators that can encode 3D point clouds and process latent codes in common latent space, respectively. However, we found that the shape characteristics and details may not always be preserved after translation. We think shape characteristics of an object include its height or width, the number of branches of a joint, the contour curvature or thickness of a certain component, etc. Shape details include bulges or holes on the surface of a component, thin structure (such as bars, rungs, slats) between main components, etc. Those compelled us to consider how to retain the shape characteristics and model details of the transferred results. Taking the first row of Fig.1 as an example, we expect that the source characteristics, such as trestle feet (inverted Tshape legs) of the left input chair and the side and cross stretchers (horizontal bars) between the legs of the right input table, can be transferred to their counterparts in the target domains. To accomplish the challenging goal, we assumed and formulated the shape characteristics in latent space, and we proposed a novel characteristic-preserving loss (cp loss) to enforce the invariance of the characteristic features transferred across domains. Besides, a center loss is applied to making transferred latent codes close to the the target domain center. While we involved these two cross-domain losses into training, our transferred results keep the shape characteristics from the source as well as exhibit the typical features of the target domain. Fig. 2 illustrates the concept diagram of the proposed framework transferring sources (chairs in this example) to their corresponding targets (tables in this example) through our latent space.
Fig. 2: Conceptual diagram of the proposed framework to transfer input chairs to their corresponding tables with similar
characteristics through the shape-aware latent space, which are formulated with our novel loss functions.
As a pioneer work about general-purpose cross-domain transformation on point clouds, LOGAN [61] generated impressive results. UNIST [4] improves the translation by applying neural implicit functions as latent representation. It also samples point clouds for comparison. We reproduced these two state-of-the-art systems by their official codes. Fig. 13 compares our results with results of LOGAN and UNIST.
Fig. 13: Comparison of our transfer results with results generated by LOGAN [61] and UNIST [4]. (a) Input chairs. (b)
Our chair-to-table transferred results from (a). (c) The chair-to-table transferred results from (a) by LOGAN. (d) The chairto-table transferred results from (a) by UNIST. (e) Input tables. (f) Our table-to-chair transferred results from (e). (g) The
table-to-chair transferred results from (e) by LOGAN. (h) The table-to-chair transferred results from (e) by UNIST.
To quantitatively assess the translation performance, we conducted comparison on Paired Arm-and-armless Chairs test dataset. Ours (proposed trans.), UNIST, and LOGAN were trained with the original armchair and armless chair data from ShapeNet in an unpaired fashion. We applied the source codes and training parameters provided by the authors of these two compared methods for training. Since the output of UNIST is not on the original scale and pose of ShapeNet models, we normalized their translated results so that they can be compared with the results of LOGAN and ours. As shown in Table 4, the Arm→Armless translated shapes predicted by our proposed method are more accurate compared to those of LOGAN and UNIST in terms of both CD and EMD. For Armless→Arm translation, our results also outperform that of LOGAN, but we cannot perform UNIST on Armless→Arm data. That is because UNIST samples voxel models at three levels of resolution and stores the information in specific files. In the Arm→Armless translation, the volumetric armchairs for test inputs are provided by UNIST, and its output can be turned into point clouds. However, in the Armless→Arm translation, the input armless chairs are newly crafted in our Paired Arm-and-armless Chairs test dataset. They are in the form of point clouds and do not exist in the volumetric files provided by UNIST. We consider the advantages of our method result from two points. First, our encoder and decoder can record delicate structure, and second, the proposed loss functions guide the framework to keep characteristics beyond the mean (typical shape of a domain) during translation
TABLE 4: Comparison of the proposed translation and
shifting framework with LOGAN [61] and UNIST [4]. LOGAN, UNIST, and Ours (proposed trans.) were trained
on arm-and-armless chair unpaired data from ShapeNet
and conducted translations. Ours (MVS) were trained on
chair and table unpaired data, and shifted shapes between
arm and armless properties. All these models were tested
on the Paired Arm-and-armless Chairs dataset. The reported
CD (chamfer distance) scores are multiplied by 103
and
EMD (earth mover’s distance) scores are multiplied by 102
.
(Armless→Arm of UNIST cannot be conducted due to lack
of armless-chair volumetric input.)
Paired Arm-and-armless Chairs Dataset
Based on the armchair and armless chair data
picked by [61] from ShapeNet Core [3], we extracted thirty
armchair models in various shapes with distinct armrests,
and we manually removed their armrest parts by a 3D point
editing tool. These manually trimmed chairs are unseen in
the original armless chair data, and they were upsampled
to 2048 points. We called this test dataset Paired Arm-andarmless Chairs.
We can see the figure below that these manually crafted armless version can become the pseudo ground
truth.
Fig. Examples of arm chairs and their armless chair counterparts, which are manually crafted. Currently, there are 30 pairs of arm- and armless-chairs in the test dataset.
download:
Paired Arm-and-armless Chairs Dataset
Please visit this github page for more thorough implementation of our work: Github link
Jia-Wen Zheng, Jhen-Yung Hsu, Chih-Chia Li, I-Chen Lin*, "Characteristic-preserving Latent Space for Unpaired Cross-domain Translation of 3D Point Clouds," IEEE Transactions on Visualization and Computer Graphics, 30(8): 5212-5226, Aug. 2024. (SCI, EI)
Paper:
preprint_version (about 25.3MB), published version (link to the IEEE digital library)
Supplementary file:
Supplementary file (pdf, about 18.1MB)
@ARTICLE{ZhengTVCG24,
author={Zheng, Jia-Wen and Hsu, Jhen-Yung and Li, Chih-Chia and Lin, I-Chen},
journal={IEEE Transactions on Visualization and Computer Graphics},
title={Characteristic-Preserving Latent Space for Unpaired Cross-Domain Translation of 3D Point Clouds},
year={2024},
volume={30},
number={8},
pages={5212-5226},
doi={10.1109/TVCG.2023.3287923}
}