3D ShapeNets: A Deep Representation for Volumetric Shapes


3D shape is a crucial but heavily underutilized cue in object recognition, mostly due to the lack of a good generic shape representation. With the recent boost of inexpensive 2.5D depth sensors (e.g. Microsoft Kinect), it is even more urgent to have a useful 3D shape model in an object recognition pipeline. Furthermore, when the recognition has low confidence, it is important to have a fail-safe mode for object recognition systems to intelligently choose the best view to obtain extra observation from another viewpoint, in order to reduce the uncertainty as much as possible. To this end, we propose to represent a geometric 3D shape as a probability distribution of binary variables on a 3D voxel grid, using a Convolutional Deep Belief Network. Our model naturally supports object recognition from 2.5D depth map, and view planning for object recognition. We construct a large-scale 3D computer graphics dataset to train our model, and conduct extensive experiments to study this new representation.


Supplementary Materials


ModelNet Benchmark Leaderboard

Please email Shuran Song to add or update your results.

SO-Net[34] 93.4% 95.7%
Minto et al.[33] 89.3% 93.6%
RotationNet[32] 97.37% 98.46%
LonchaNet[31] 94.37
Achlioptas et al. [30] 84.5% 95.4%
PANORAMA-ENN [29] 95.56% 86.34% 96.85% 93.28%
3D-A-Nets [28] 90.5% 80.1%
Soltani et al. [27]82.10%
Arvind et al. [26]86.50%
LonchaNet [25] 94.37%
3DmFV-Net [24]91.6% 95.2%
Zanuttigh and Minto [23]87.8% 91.5%
Wang et al. [22]93.8%
ECC [21]83.2%90.0%
PANORAMA-NN [20]90.7%83.5%91.1%87.4%
MVCNN-MultiRes [19]91.4%
FPNN [18]88.4%
Klokov and Lempitsky[16]91.8% 94.0%
LightNet[15] 88.93%93.94%
Xu and Todorovic[14]81.26% 88.00%
Geometry Image [13]83.9% 51.3%88.4%74.9%
Set-convolution [11]90%
PointNet [12]77.6%
3D-GAN [10]83.3%91.0%
VRN Ensemble [9]95.54%97.14%
ORION [8] 93.8%
FusionNet [7]90.8%93.11%
Pairwise [6]90.7%92.8%
MVCNN [3]90.1%79.5%
GIFT [5] 83.10%81.94% 92.35%91.12%
VoxNet [2]83%92%
DeepPano [4]77.63%76.81%85.45%84.18%
3DShapeNets [1]77%49.2%83.5%68.3%

[1] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang and J. Xiao. 3D ShapeNets: A Deep Representation for Volumetric Shapes. CVPR2015.
[2] D. Maturana and S. Scherer. VoxNet: A 3D Convolutional Neural Network for Real-Time Object Recognition. IROS2015.
[3] H. Su, S. Maji, E. Kalogerakis, E. Learned-Miller. Multi-view Convolutional Neural Networks for 3D Shape Recognition. ICCV2015.
[4] B Shi, S Bai, Z Zhou, X Bai. DeepPano: Deep Panoramic Representation for 3-D Shape Recognition. Signal Processing Letters 2015.
[5] Song Bai, Xiang Bai, Zhichao Zhou, Zhaoxiang Zhang, Longin Jan Latecki. GIFT: A Real-time and Scalable 3D Shape Search Engine. CVPR 2016.
[6] Edward Johns, Stefan Leutenegger and Andrew J. Davison. Pairwise Decomposition of Image Sequences for Active Multi-View Recognition CVPR 2016.
[7] Vishakh Hegde, Reza Zadeh 3D Object Classification Using Multiple Data Representations.
[8] Nima Sedaghat, Mohammadreza Zolfaghari, Thomas Brox Orientation-boosted Voxel Nets for 3D Object Recognition. BMVC
[9] Andrew Brock, Theodore Lim, J.M. Ritchie, Nick Weston Generative and Discriminative Voxel Modeling with Convolutional Neural Networks.
[10] Jiajun Wu, Chengkai Zhang, Tianfan Xue, William T. Freeman, Joshua B. Tenenbaum. Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling. NIPS 2016
[11] Siamak Ravanbakhsh, Jeff Schneider, Barnabas Poczos. Deep Learning with sets and point clouds
[12] A. Garcia-Garcia, F. Gomez-Donoso†, J. Garcia-Rodriguez, S. Orts-Escolano, M. Cazorla, J. Azorin-Lopez. PointNet: A 3D Convolutional Neural Network for Real-Time Object Class Recognition
[13] Ayan Sinha, Jing Bai, Karthik Ramani. Deep Learning 3D Shape Surfaces Using Geometry Images ECCV 2016
[14] Xu Xu and Sinisa Todorovic. Beam Search for Learning a Deep Convolutional Neural Network of 3D Shapes
[15] Shuaifeng Zhi, Yongxiang Liu, Xiang Li, Yulan Guo Towards real-time 3D object recognition: A lightweight volumetric CNN framework using multitask learning Computers and Graphics (Elsevier)
[16] Roman Klokov, Victor Lempitsky Escape from Cells: Deep Kd-Networks for The Recognition of 3D Point Cloud Models
[17] Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. CVPR 2017.
[18] Yangyan Li, Soeren Pirk, Hao Su, Charles R. Qi, and Leonidas J. Guibas. FPNN: Field Probing Neural Networks for 3D Data. NIPS 2016.
[19] Charles R. Qi, Hao Su, Matthias Niessner, Angela Dai, Mengyuan Yan, and Leonidas J. Guibas.
Volumetric and Multi-View CNNs for Object Classification on 3D Data. CVPR 2016.
[20] K. Sfikas, T. Theoharis and I. Pratikakis.
Exploiting the PANORAMA Representation for Convolutional Neural Network Classification and Retrieval. 3DOR2017.
[21] Martin Simonovsky, Nikos Komodakis
Dynamic Edge-Conditioned Filters in Convolutional Neural Networks on Graphs.
[22] Chu Wang, Marcello Pelillo, Kaleem Siddiqi. Dominant Set Clustering and Pooling for Multi-View 3D Object Recognition. BMVC 2017.
[23] Pietro Zanuttigh and Ludovico Minto Deep Learning for 3D Shape Classification from Multiple Depth Maps ICIP 2017.
[24] Yizhak Ben-Shabat, Michael Lindenbaum, Anath Fischer 3D Point Cloud Classification and Segmentation using 3D Modified Fisher Vector Representation for Convolutional Neural Networks arXiv 2017.
[25] F. Gomez-Donoso, A. Garcia-Garcia, J. Garcia-Rodriguez, S. Orts-Escolano, M. Cazorla LonchaNet: A sliced-based CNN architecture for real-time 3D object recognition Neural Networks (IJCNN), 2017.
[26] Varun Arvind, Anthony Costa, Marcus Badgeley, Samuel Cho, Eric Oermann Wide and deep volumetric residual networks for volumetric image classification arXiv 2017.
[27] Amir Arsalan Soltani, Haibin Huang, Jiajun Wu, Tejas D. Kulkarni, Joshua B. Tenenbaum Synthesizing 3D Shapes via Modeling Multi-View Depth Maps and Silhouettes with Deep Generative Networks CVPR 2017
[28] Mengwei Ren, Liang Niu, Yi Fang 3D-A-Nets: 3D Deep Dense Descriptor for Volumetric Shapes with Adversarial Networks
[29] K Sfikas, I Pratikakis and T Theoharis, Ensemble of PANORAMA-based Convolutional Neural Networks for 3D Model Classification and Retrieval Computers and Graphics
[30] Panos Achlioptas, Olga Diamanti, Ioannis Mitliagkas, Leonidas Guibas. Learning Representations and Generative Models for 3D Point Clouds, arXiv 2017
[31] F. Gomez-Donoso, A. Garcia-Garcia, J. Garcia-Rodriguez, S. Orts-Escolano, M. Cazorla. LonchaNet: A sliced-based CNN architecture for real-time 3D object recognition"
[32] Asako Kanezaki, Yasuyuki Matsushita and Yoshifumi Nishida. RotationNet: Joint Object Categorization and Pose Estimation Using Multiviews from Unsupervised Viewpoints. CVPR, 2018.
[33] L. Minto ,P. Zanuttigh, G. Pagnutti Deep Learning for 3D Shape Classification Based on Volumetric Density and Surface Approximation Clues, International Conference on Computer Vision Theory and Applications (VISAPP), 2018
[34] J. Li, B. M. Chen, G. H. Lee SO-Net: Self-Organizing Network for Point Cloud Analysis". SO-Net is a deep learning based approach for point cloud recognition. CVPR2018

Source code



This work is supported by gift funds from Intel Corporation and Project X grant to the Princeton Vision Group, and a hardware donation from NVIDIA Corporation. Z.W. is also partially supported by Hong Kong RGC Fellowship. We thank Thomas Funkhouser, Derek Hoiem, Alexei A. Efros, Andrew Owens, Antonio Torralba, Siddhartha Chaudhuri, and Szymon Rusinkiewicz for valuable discussion.