3D reconstruction project
Required functionalities
Each group should implement a system with the following functionality:
-
Gold standard F-matrix estimation from known correspondences. (Many of the required functions exist in LAB3, and in OpenCV).
-
Computation of an E-matrix from
F and known K, and subsequent extraction of
R and t from E, and visible-points constraint.
- Robust Perspective-n-point (PnP) estimation for adding a new view and detecting outliers.
-
Bundle adjustment for N cameras (N≥2) using known correspondences, intrinsic camera parameters (K-matrix), and an initial guess for the camera poses.
- Use a sparsity mask for the Jacobian in the Bundle adjustment step (this speedup is absolutely essential if you use scipy.optimize.least_squares). Alternatively, in C++ you can use Ceres Solver, which handles the sparsity in a different way. Yet another option is to use pytorch (Ask your guide for details).
- Evaluate the robustness to noise in the implemented system, by measuring the camera pose errors of your reconstruction. Details can be found here.
In addition to the above, groups with four students should implement at least one of the following functionalities:
- Visualisation of the 3D model. Pick one of the following:
- Use Poisson surface reconstruction (or screened PSR) to obtain a volumetric representation. Try e.g. the version in Meshlab, or Kazhdan's own implementation (has more options). Both require you to estimate colour and surface normals for the 3D-points, and saving these in PLY format. Use visibility in cameras to determine normal signs. See Kazhdan and Hoppe for a theoretical description.
- Use space carving (as in Fitzgibbon, Cross and Zisserman) to generate a volumetric representation of your object.
- Use texture patches (e.g. PMVS) or billboards, to represent the texture of the estimated 3D model.
- Find correspondences between pairs of views, and remove false matches using epipolar geometry and cross-checking. (Allows the implemented system to use any image set with known K)
- Use your own camera to take images of an object, and reconstruct it. This requires estimation of the K-matrix, and possibly lens distortion, followed by image rectification. Note that this also requires finding your own correspondences (functionality #2), and that you need to carefully plan how to take the images.
- Densify your initial sparse SfM model, using dense stereo between selected frame pairs. E.g. PatchMatch can be used, or using one of the approaches in the Multiview Stereo tutorial.
- Use next-best-view selection (e.g. as in Schönberger et al., or some simplified form thereof) to select the order in which cameras are added to SfM.
- Replace Incremental SfM with Global SfM. Use global estimation of all rotation matrices as an initialisation of Bundle Adjustment (as in Martinec and Pajdla).
- Write your own non-linear solver, e.g.~Levenberg-Marquardt (see the report by Madsen et al.), and use the Shur-Complement Trick to speed up the computation of the update step (see the IREG compendium).
Groups with five students are required to implement two of the above, and groups with six members should implement three.
Datasets
The Visual Geometry Group at Oxford University has a number of datasets availabe on their web site, e.g., the dinosaur sequence:
These data sets contain coordinates of image points that are tracked between images, which means that there is both some amount of noise on the image coordinates and that these points are not visible over the entire sequence since they are occluded in some or even most of the images. The data sets contain some type of ground truth in terms of estimated camera matrices for each view.
Notice that the dinosaur sequence is a turn-table sequence that is generated by rotating the object around a fixed axis. The acquisition geometry of this dataset makes it sensitive to the quality of the used point correspondences.
For debugging it is useful to work with a noise free dataset, and for this purpose we have "cleaned" the dinosaur data set from any noise (up to the numerical accuracy of Matlab) and produced a dataset that contains 2D image points, 3D points, and the camera matrices. These are found in the BAdino2.mat file on the local ISY file system:
- /courses/TSBB15/sequences/BAdino/BAdino2.mat
The EPFL multi-view stereo datasets are also highly recommended:
They all have undistorted high-res images, with ground-truth camera poses and known intrinsic camera parameters. However, you have to find the correspondences yourself here.
Python code
Please look at the utility code for CE3. You can find it at /courses/TSBB15/python/ in the computer labs. Also look at the Extra exercises in the lab sheet, they are meant to help you get started with the projects.
Matlab code
In the case that you use Matlab for solving the tasks in this project there are some software packages, that will spare you from implementing all functionalities yourself:
- The Visual Geometry Group at Oxford University
- Peter Kovesi at the University of Western Australia
-
The LSQNONLIN optimiser in Matlab. Note that LSQNONLIN is very slow, unless a sparsity mask is used.
-
P3P by Laurent Kneip. Note: Documentation is weird. You need
to verify the input and output behaviour yourself on synthetic
data (e.g. for coordinate axes). See this example for how to do this in Matlab.
C code
You are absolutely NOT allowed to use complete SfM systems such as COLMAP, SBA, Bundler, Visual SFM, SSBA, and OpenMVG.On the other hand, you are encouraged to use a non-linear least squares package, and if you intend to make a more advanced representation of your 3D model, you may use a package for this. Some useful (and allowed) packages are listed below.
- Ceres Solver, a library for efficient non-linear optimization.
- levmar, and lmfit, Levenberg-Marquardt optimisation packages.
- sparseLM, a sparse Levenberg-Marquardt optimisation package (much faster, but also more complex to set up).
-
PMVS, package to compute 3D models from images and camera poses.
- OpenCV, implements several local invariant features, and some geometry functions.
- Lambda twist P3P by Mikael Persson and Klas Nordberg.
-
P3P by Laurent Kneip.
- OpenGV, a collection of geometric solvers for calibrated epipolar geometry.
Deliverables
In order to pass, each project group should do the following:
-
Make a design plan, and get it approved by the guide.
The design plan should contain the following:- A list of the tasks/functionalities that will be implemented
- A group member list, with responsibilities for each person (e.g. what task)
- A flow-chart of the system components
- A brief description of the components
- Deliver a good presentation at the seminar
- Hand in a written report to the project guide.
- A group member list, with responsibilities for each person
- A description of the problem that is solved
- How the problem is solved
- What the result is (i.e. performance relative to ground truth)
-
Why the result is what it is
-
References to used methods
We recommend that you use the CVPR LaTeX template when writing the report.
References
- C. Barnes et al. PatchMatch: A randomized correspondence algorithm for structural image editing, ACM Transactions on Graphics (Proceedings of SIGGRAPH) 2009.
- A. Fitzgibbon, G. Cross and A. Zisserman, Automatic 3D Model Construction for Turn-Table Sequences , in 3D Structure from Multiple Images of Large-Scale Environments , Editors Koch & Van Gool, Springer Verlag 1998, pages 155-170.
- Y. Furukawa and C. Hernández, Multiview Stereo: A Tutorial. In Foundations and trends in Computer Graphics and Vision Vol. 9, No. 1-2, 2013.
- Michael Kazhdan and Hugues Hoppe, Screened Poisson Surface Reconstruction, ACM Transactions on Graphics 2013.
- K. Madsen et al., Methods for Non-Linear Least Squares Problems, 2nd ed., DTU technical report 2004.
- Daniel Martinec and Tomáš Pajdla, Robust Rotation and Translation Estimation in Multiview Reconstruction, CVPR 2007.
- Klas Nordberg, Introduction to Representations and Estimation in Geometry (IREG).
- Johannes Schönberger and Jan-Michael Frahm, Structure-from-Motion Revisited, CVPR 2016.
- Triggs, McLauchlan, Hartley and Fitzgibbon, Bundle Adjustment - A Modern Synthesis, in Vision Algorithms - Theory and Practice, Springer Verlag, Lecture Notes in Computer Science No 1883, 2000, pages 298-372.
- Daniel Scharstein and Richard Szeliski, A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms, IJCV 47(1-3), 2002
- Brian Curless and Marc Levoy, A Volumetric Method for Building Complex Models from Range Images, SIGGRAPH'96, 1996
- William E. Lorenzen, Harvey E. Cline, Marching cubes: A high resolution 3D surface construction algorithm, SIGGRAPH'87, 1987
- R. Newcombe et al. KinectFusion: Real-time Dense Surface Mapping and Tracking. ISMAR'11, 2011
- C. Loop and Z. Zhang, Computing Rectifying Homographies for Stereo Vision, ICCV 1999.
Senast uppdaterad: 2021-12-22