Princeton Vision & Robotics Toolkit (PVRT)

Princeton Vision & Robotics Toolkit (PVRT) is an open-source software library including a diverse set of functions that are useful and non-trivial to implement for fast-prototyping in vision and robotics research.


Marvin: A minimalist GPU-only N-dimensional ConvNet framework

Never before has it been so easy to learn so deeply. Marvin was born to be hacked, relying on few dependencies and basic C++. All code lives in two files (marvin.hpp and and all numbers take up two bytes (FP16).

Uniform Grid on 3D sphere by Subdividing Icosahedron

In many applications, we need to put a unform grid on a 3D sphere, or samples uniformly distributed on a unit sphere. For example, we want to approximate the 3D rotation space by unformly discretizing the sphere space for the 3D rotation axis, to be used as label space in a histogram of orientation, Markove Random Field or classifier, etc.


SUN3Dsfm: Structure From Motion for RGB-D videos using Generalized Bundle Adjustment

This is a Matlab, MEX, C++ program for 3D reconstruction of a scene using a Kinect RGB-D video as input. The algorithm is descirbed in our SUN3D paper ( Basically, it matches SIFT keypoints across neighboring frames, and compute a relative pose between two consecutive frames using RANSAC with 3 point algorithm. It also detect loop closing using Bag of Word on SIFT keypoints. And then it runs a bundle adjustment algorithm to optimize for the final poses. If the object annotation is available, it can also use the object annotation to improve the camera poses using our Generalized Bundle Adjustment algorithm. The difference between this program and the one below (SiftFu) is that we don't model the space as a TSDF, which is very memory consuming. Therefore, this program is suitable for large-scale reconstruction for very big space.

SiftFu: Kinect depth map improvement using multiple frames with moving cameras

This is a Matlab and MEX program for 3D reconstruction of a scene using a Kinect video as input. Unlike KinectFusion, it uses both image and depth at the same time for reconstruction. The algorithm starts from sparse SIFT keypoint matching between two frames, and use the 3D coordinates of these SIFT keypoints from Kinect depth map to estimate the camera relative poses (RANSAC + a 3-point algorithm for the inner RANSAC loop). Then, a volumetric voxel grid is maintained in the space, and the depth from each frame is used to accumlate on the voxel grids (like Kinect Fusion). After that, a ray-casting algorithm is implemented to get a better depth map from the first frame. This program can be used for KinectFusion style reconstruction. More specifically, it is tailed to do ray-casting from the first frame, which can be used to improve the raw depth map from Kinect using the depth from other frames. The major difference between our approach and KinectFusion is that we use SIFT keypoint for matching between frames, which KinectFusion uses ICP to align the two point clouds directly. Therefore, it has the advantage of better alignment for flat region, at the cost of requiring a good image as input. It also contains our implementation of angle-axis for rotation manipulation.

Everything is implemented in Matlab with C++ MEX running on standard CPU. There is no GPU required to run the program. The speed is not optimized and it is not run in read-time.

Align two RGB-D images by SIFT + ICP

Taken two RGB-D images as input, the algorithm first detect SIFT keypoints and use 3-point algorithm under RANSAC to estimate the alignment. Then, initialized by this alignment, it tries to improve the alignment by using ICP with the dense point clouds. If the ICP results drift too much (indicated by very big distances between the original SIFT keypoint matching), then the ICP result is discard. Otherwise, the ICP result is used as the final alignment result. In this way, we combine the strength of reliable SIFT keypoints and the dense point cloud to obtain a reliable and accurate estimation.

Warp a depth map by rendering a mesh

This code demonstrates how to warp a depth map (togeter with a label map) from a camera to another view. It create a mesh using the depth map from a camera and render the mesh in OpenGL. Therefore, it will not create artifacts with a lot of gaps in a warpped result.

Depth Map Improvement for Structure IO file format

SFMedu: Structrue From Motion for Education Purpose

This code demonstrates how a traditional structure from motion pipeline is done and how to compute a dense point cloud by matching propagation in a simplest way. It is mainly designed for teaching COS429: Computer Vsion at Princeton. Please check the slides together with this code release.

GoogLeNet GPU implementation

We implemented GoogLeNet using a single GPU. Our main contribution is an effective way to initialize the network and a trick to overcome the GPU memory constrain by accumulating gradients over two training iterations. The training/testing code and pre-trained models for ImageNet and Places are available for download.

DrawMe: A light-weight Javascript library for line drawing on a picture

DrawMe is a light-weight Javascript library to enable client-end line drawing on a picture in a web browser. It is targeted to provide a basis for self-define labeling tasks for computer vision researchers. It is different from LabelMe, which provides full support but fixed labeling interface. DrawMe is a Javascript library only and the users are required to write their own code to make use of this library for their specific need of labeling. DrawMe does not provide any server or server-end code for labeling, but gives the user greater flexibility for their specific need. It also comes with a simple example with Amazon Mechanical Turk interface that serializes Javascript DOM object into text for HTML form submission. The user can easily build their own labeling interface based on this MTurk example to make use for the Amazon Mechanical Turk for labeling, either using paid workers or the researchers themselves with MTurk sandbox.

DrawMe supports all major browsers, and is currently implemented using HTML5 canvas for non-IE browsers (Firefox, Chrome, Safari, etc.), and VML (Google encanvas) for IE browser.

Read CSV in Matlab for Amazon Mechanical Turk Result

Matlab has an internal CSV reader: csvread. But it doesn't work for loading the CSV result from Amazon Mechanical Turk. Therefore, we implement this more robust reader for CSV, specifically designed for CSV files from Turk. It can also handles "" used in JSON encoding of CSV for Turk. (Typical JSON useds one ". But to embeded in CSV for Turk, it should be two "". Since " is also used for CSV to separating different fields, it is very confusing for most readers. Our implementation can take care of this.)

Constructing a new Amazon Mechanical Turk CSV file for the remaining HITs

Sometimes, only a subset of HITs are finished, and we need to republish the remaining HITs on Mechanical Turk again. This function take a turk input.csv file and result.csv file and output a new file remaining_input.csv for the HITs that haven't been done based on a specific column.

Matlab Amazon Mechanical Turk API

Amazon Mechanical Turk provides a command line tool and a API. This is a Matlab interface to call those API functions provided by Amazon. It is a Matlab alternative to PHP Turk50.

A simplest example for using Javascript as a template in Amazon Mechnical Turk.


This is a well-engineered and well-designed interface for making use of Amazon Mechanical Turk service to select the subsect of images from the list that satisfies certains rules. The user keeps the right arrow [→] key (or [d] key) pressed down to move continuously from an image to the next. Release the key when you see an image for which the answer should be YES. Use [space] key to toggle the answer to YES. The default answer is NO. The user can also use the left arrow [←] key (or [a] key) to go back to the previously seen images, if you skipped an image accidentaly.

Internet Image Cleaner

If you download images from Internet search engines, in most cases, there will be a lot of bad photos returned. This program automatically recognizes if an image is too dark (e.g. pure black), too while, too small (e.g. icon), empty (e.g. Flickr file unexist place holder), and also automatically crop the margin if there is any padding. The result is not always perfect, but we find it still very useful most of the time.

Duplicate Images Detection and Removal

This script finds all duplicates between all images under a folder (recursively), and DELETE all the duplicated images. The algorithm first computes the GIST descriptor of the images, and the use PCA to figure out the first principle component. Sorted by the first principle component, the algorithm checks the distance between nearboring images with similar first principle component of GIST, by computing their actual distance between the full GIST descriptors. Then, a connected component algorithm is used to find all members in a group of duplicates. Finally, for each group of duplicates found, the algorithm keeps the image with highest resolution, and delete all other images.

Texture Map 2D Bin Packing

This is a C++ program for packing texture for rectangles into a big image It is solving NP-hard 2D Bin Packing problem for rectangles using simple heuristic.



accumarray is a Matlab function to construct array with accumulation, but it only works for one dimension. Therefore, we implement a fast multi-dimensional accumarray in MEX with C++. For example, it is useful to aggregate the feature based on a segmentation map.

Input: takes a (d*n) single matrix FeatureIn, a (1*n) uint32 matrix Index, a uint32 integer MaxIndex to indicate the size of matrix Index. Output: a (d*MaxIndex) single matrix FeatureOut.


This code is to show how to generate a crop from a full view panorama into a normal image, and how to warp it back.

Panoramic Image Processing Toolbox


Two simple functions to plot a curve or a histogram in polar coordinate.

3D cuboid reconstruction from 2D corner locations


This program takes input of the image coordinate of the 7 corners of a cuboid, and fit a 3D cuboid, and output the new image coordinate of the corners. If changeID is 1-7, it will ignore that corner, and use the rest of the corners to estimate the result. If changeID is 8, it will use all 7 corners to fit the cuboid.


This code is to do off-screen rendering using Mesa3D to render a Mesh given a 3x4 camera matrix with an image resolution width x height. The rendering result is an ID map for facets, edges and vertices, or a depth map read from the z-buffer. This can usually used for occlusion testing in texture mapping a model from an image, such as the texture mapping for image-based 3D modeling. It is written in C++. And a Matlab version using MEX is also available.


Usual "dir" command is able to list only local file folders. We implemented dirSmart that can list files in a web server. The matlab version can list both local files and files in a web server smartly, i.e. automatically decide which one to use. For example, together with imread in Matlab that can both read image locally and on the web, you can create a program to list all images and visualize them, using the same code without considering if the files sit locally or in a web server. The Javascript version can list all files from a web server, and based on the criteria that you enter, it decides which files to show. You can change it to become a dynmaic image browser.

HOG-based Template Matching

This piece of code is to demonstrate simple HOG sliding window template matching. You can build your sliding window object detector based on this, or use it for other purpose such as finding a certain patten, as demonstrated in our FrameBreak project.

Online SVM

This is a very simple implementation of online Support Vector Machine classifier that you can add some training data and refine the model after it is initialized. The structural SVM implements the cutting plane algorithm.

Generate Nice Color Bar Plot

This piece of code is to demonstrate how to draw a nice color bar plot, such as Figure 6 and Figure 7 (pie plot with the color generated here) in the SUN3D paper.


This is a replacement of imread in Matlab to handle auto-rotation in JPEG. Matlab seems not able to handle automatic rotation of image in imread (at least until R2012a version). Therefore, we implemented this file to automatically rotate the image into the correct direction based on EXIF orientation. We have tested this function on iPhone 5 with iOS 6.



This is a very simple function to read a text file and just put it as a long string in Matlab.


A simple scripts to send a customized email to a list of people using Matlab.


This function will compute the intersection volume divided by the union volume between two groups of cuboids. It is typically useful for evaluating 3D object detection. The 3D cuboids should align with z axis. Each column is a 3D cuboid with [x1 y1 x2 y2 x3 y3 x4 y4 zMin zMax]'.


A simple function that returns the indefinite articles "a" or "an" based on a given word or phrase. This is port from Javascript implementation listed below.


A list of functions to handle [Rotation Translation] matrix. Including conversion of angle axis representation and rotation matrix, concatenation and inverse of Rt matrix.


A simple function to write a set of 3D points as a ply file, which can be open in Meshlab.


A simple function to write triangulated mesh as a ply file, which can be open in Meshlab. For example, one can generate a mesh from isosurface in Matlab and use this function to save the mesh as a ply file.


A simple function to estimate a normal vector based on nearby k points.


A simple function to render a point cloud to an image. Note that we assume the coordinate is the camera coordinates. For example, if you have a camera matrix P, you should input P*X as your coordinate.


A simple function draws a 3D CG model with the normal of their faces. The normal is computed by left-hand-rule, i.e. vertices are ordered clockwise with respect to its outward normal.


A simple demo to convert a polygon mesh into a voxel representation.


A simple function that save a mesh (indexed vertices for faces) as an OFF file. (No edge is supported)


A simple function that save a mesh (indexed vertices for faces) as an VRML2 file with texture map.


Some mesh has duplicate vertices that mess up many algorithms. This function removes the redudancy of the mesh vertices.


A simple function to automatic remove white space in an image, assuming the top left pixel is the background color that you don't want.


This script parses a latex file and copies all figures into a new location. This is usually used to get a clean version for submitting latex files to venues like IJCV or ECCV that requires the latex file for final submission.


This is a Matlab reader to demonstrate how to load Google CityBlock R5 data. The data is copyrighted by Google, and we are not allowed to distribute the data. But if you can get the data from Google directly, you can use this code to load the data.

TurtleBot 2 Matlab Controller

This code provides a way to use Matlab to control TurtleBot 2 movement.

base64 decoding in Matlab

Using a Java call.

extract focal length from EXIF in Matlab

Compute the focal length in pixels using the focal length value in EXIF and the sensor size.

Convert the .oni file captured by OpenNI into SUN3D RGB-D video sequence format.

Need OpenCV.

Draw a lot of camera on a half sphere.

Query Google, Bing and Flickr to search image based on a keyword.

Generate a VRML2 model file for a 3D model with texture.


Recursively look for files fit certain patern in the file names under a path.


Vectorized version of fileparts.


A offscreen OpenGL rendering code to render point cloud and cameras.


A demo of how to pop up a window, and to let the window become fullscreen, and send back the result and submit the form, in Amazon Mechanical Turk template.


Mex-version of the graph-based segmentation based on Pedro's code for general graph.


Compute the normal values based on a depth map.


Output a graph into a ply file.


Given camera poses for an RGB-D image collection, generate the TSDF and compute a mesh using Marching Cube.


Several less useful or less well implemented functions, including autocalibration, camera matrix decomposition, GPU-base Loop Belief Propagation, high-order Loop Belief Propagation, a graph-cut segmentation-based image_annotator, and RVM classifier in C++.

How to cite

J. Xiao, 2013. Princeton Vision and Robotics Toolkit. Available from: <>.

BibTeX entry

@misc {PVT,
	author = "Jianxiong Xiao",
	title = "Princeton Vision and Robotics Toolkit",
	note = "Available from: \url{}",
	year = "2013",


To use most of these functions you will need MATLAB and the MATLAB Image Processing Toolbox. You may also want to refer to the MATLAB documentation and the Image Processing Toolbox documentation. Alternatively you may be able to use Octave which is an open source alternative to MATLAB. But none of the functions on this page are tested on Octave. See Peter Kovesi's Notes on using Octave.

C++, Python and Javascript

Some of these functions use C++, due to the consideration of speed. They are tested in gcc or Matlab MEX. Some of these functions use Python, mostly because we want to run it easily as CGI service in a web server. Some of these functions use Javascript, because we want to run it in a Web browser.


These functions are NOT rigorously tested and may contain bugs. Please do NOT use them without rigorous testing for any life critial applications!


It is released under MIT license:

Copyright (c) Princeton Vision Group

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.