Research Group on Visual Computation

Simultaneous Multi-View Relative Pose Estimation and 3D Reconstruction from Planar Regions

Icon of project

Description

Abstract

In this project, we propose a novel solution for multi-view reconstruction, relative pose and homography estimation using planar regions. The proposed method doesn't require point matches, it directly uses a pair of planar image regions and simultaneously reconstructs the normal and distance of the corresponding 3D planar surface patch, the relative pose of the cameras as well as the aligning homography between the image regions. When more than two cameras are available, then a special region-based bundle adjustment is proposed, which provides robust estimates in a multi-view camera system by constructing and solving a non-linear system of equations. The method is quantitatively evaluated on a large synthetic dataset as well as on the KITTI vision benchmark dataset.

Problem statement

As can be observed on the image below, we have a camera system consisting of at least 2 cameras, having the reference camera chosen as C_0, and defining the relative pose (R,t) of all cameras to the reference as shown on the image for camera k and l. Since the image of the 3D plane pi is projected in each camera, we can define planar homographies acting between the segmented regions of these.

Making use of the standard homography composition seen below, we can estimate these homographies in a way that will directly provide us the relative pose of the cameras (R,t) and also the parameters of the 3D plane (n,d):

Having more regions and/or more cameras available to work with we can construct a bundle adjustment solution that will optimize all parameters simultaneously. Having more regions gives more constraints for the relative poses, while having more camera positions we can write up multiple homographies involving the same plane, thus it gives more constraints for the planar reconstruction.

Evaluation

The method was tested on a large synthetic dataset, evaluating the performances of the minimal case with 1 region and 2 cameras, and also the cases of 5 cameras and 3 regions. Performances were compared to standard homogrpahy factorization, while robustness against segmentation errors was also evaluated.

Real data results

Besides synthetic datasets, the proposed method was also tested on real images. The sole input of the algorithm is the binary segmentation mask of each image, containing the image of the same planar region (and also the camera calibration parameters of course). Since in the first case we had the precise 3D lidar pointcloud of the scene, that also included precisely measured markers, we could evaluate both the camera poses and the reconstruction results with high precision ground truth data. On the image below we show results on 5 image frames captured by a flying drone, the first and last frame of the sequence used are shown on the right, with the segmented regions marked in red. The results are shown on the left in terms of relative poses illustrated with the camera objects (green is the reference camera poses, while red is the estimated relative poses to the middle camera), and also as reconstructed planar shapes visualized in red in the 3D scene.

KITTI results

The method was evaluated on the KITTI Visual Odometry dataset as well, where we selected the frames that contained traffic signs, that were segmented both in 2D and 3D (for validation). Using 5 consecutive frames the camera poses were estimated with a median error of 0.2 degrees and 9 cm, while the reconstruction had in median less then 10 degree error in the normal vector of the plane and 50 cm in the plane distance. The results are comparable to some State of the Art methods, as we have shown in a comparison with the COLMAP general point based reconstruction method, our proposed solution is able to reconstruct planar regions more precisely and robustly.

Input segmentation example:

Example relative pose and reconstruction result:

Green is the reference, red is the proposed method, while blue is COLMAP

Publications to cite:
  1. Robert Frohlich, Zoltan Kato, Simultaneous Multi-view Relative Pose Estimation and 3D Reconstruction from Planar Regions, In Proceedings of ACCV Workshop on Advanced Machine Vision for Real-life and Industrially Relevant Applications (Gustavo Carneiro, Shaodi You, eds.), Springer, vol. 11367, Perth, Australia, pp. 467-483, 2018. [bibtex]

Hichem Abdellali has been awarded the Doctor of Philosophy (PhD.) degree...

2022-04-30


Hichem Abdellali has been awarded the KÉPAF Kuba Attila prize...

2021-06-24