Localizing 3D Cuboids in Single-view Images

Given a single-view input image, our goal is to detect the 2D corner locations of the cuboids depicted in the image. With the output part locations we can subsequently recover information about the camera and 3D shape via camera resectioning.


In this paper we seek to detect rectangular cuboids and localize their corners in uncalibrated single-view images depicting everyday scenes. In contrast to recent approaches that rely on detecting vanishing points of the scene and grouping line segments to form cuboids, we build a discriminative parts-based detector that models the appearance of the cuboid corners and internal edges while enforcing consistency to a 3D cuboid model. Our model copes with different 3D viewpoints and aspect ratios and is able to detect cuboids across many different object categories. We introduce a database of images with cuboid annotations that spans a variety of indoor and outdoor scenes and show qualitative and quantitative results on our collected database. Our model out-performs baseline detectors that use 2D constraints alone on the task of localizing cuboid corners.


SUN Primitive Database

The dataset contains four primitive shapes annotation for RGB images, as well as cuboid annotation for RGB-D images.

  • cuboid: cuboid annotation on single-view RGB images, as well as negative training set that contains no cuboid at all.
  • pyramid: pyramid annotation on single-view RGB images.
  • cylinder: cylinder annotation on single-view RGB images.
  • sphere: sphere annotation on single-view RGB images.
  • RGBDcuboid: cuboid annotation on single-view RGB-D images with depth, as well as negative training set that contains no cuboid at all.

Source Code

Source code is available on GitHub: https://github.com/brussell123/SUNprimitive


  1. git clone https://github.com/brussell123/SUNprimitive.git
  2. cd SUNprimitive
  3. run "compile" in Matlab
  4. run "demo" in Matlab and this will produce the output cuboid detections for the example image.

The zip file is a snapshot of the latest source code on github.



Jianxiong Xiao was supported by Google U.S./Canada Ph.D. Fellowship in Computer Vision. Bryan Russell was funded by the Intel Science and Technology Center for Pervasive Computing (ISTC-PC). This work was funded by ONR MURI N000141010933 and NSF Career Award No. 0747120 to Antonio Torralba.