Research Group on Visual Computation

Estimating Low-Rank Region Likelihood Maps

Icon of project

Description

Detection of low-rank regions differs from classical detection problems because it cannot be considered as a binary decision problem. Low-rank regions capture geometrically meaningful structures in an image which encompass typical local features such as edges and corners as well as all kinds of regular, symmetric often repetitive patterns, that are commonly found in man-made environment.

Our approach propose a method to generate a low-rank likelihood map for a given image. It is quasi impossible to directly compute this map, so to obtain such maps, we estimate a ”low-rankness” score by using TILT on a sliding window at multiple scales and predefined steps. At each such position, we fit a local Gaussian with a ”low-rankness” score selecting a bandwidth depending on the window size. This can be seen as a weighted kernel density estimation (wKDE) of the targeted probability density function. Obtaining the probability map as described is extremely costly because we have to run two incorporated iterative optimizations in TILT and within the ALM algorithm for every sliding window and at several scales. Therefore, we propose to train a deep neural network that learns to directly predict such maps from a given image, trained with a set of likelihood maps generated with the method mentioned earlier. Willing to get a pixel level output, we considered models used for image segmentation as network architectures, which include both down- and upscaling to obtain an output feature map of the same resolution as the input image. We experimented with two different architectures: Segnet and Full-Resolution Residual Networks (FRRN). In both cases, we modified the network as follows: Since color is not relevant for detecting low-rank regions, we first convert the input images to single-channel grayscale. Similarly, since we only consider a single feature map output, we smooth the map with an average pooling layer and finally we normalize the values in the map to be between 0 and 1. Then we use a loss based on Kullback-Leibler (KL) divergence as the objective function for training the network.

Fig.1. Modified Segnet(left) and Full-Resolution Residual Networks (right).

Results

To train our model, we used the Aachen Day-Night dataset. We randomly split the Aachen training set into three groups of images: 500 for validation, 500 for testing, and 3328 remain for training. In addition to the test images we used the images Day (milestone) and Night (nexus 5x) from the official Achen test set for testing. In order to keep the network’s memory consumption and training time reasonable, we reduced the image size to 800x640 by randomly alternating between rescaling with zero-padding and random cropping. Even if this strategy already introduced variability in the training set, we additionally applied various data augmentation methods such as flip, rotation, gamma, brightness, contrast, and saturation change.
Fig.2. Average KL divergence values on our train, val and test splits of the Aachen Day-Night dataset
Fig.3. Average KL divergence values on the Day (milestone) and Night (nexus5x) set from the official Aachen-Day-Night dataset.
Fig.4. Examples of predicted low-rank likelihood maps for test images obtained with TILT+wKDE (second column), with SegNet-based deep network (thirdcolumn), and with the FRNN-based deep network (fourth column).
Fig.5. Examples of predicted low-rank likelihood maps for images from Day (milestone) and Night (nexus5x) obtained with TILT+wKDE (second column), Segnet-based (third column) and FRNN-based (fourth column) deep network.
Fig.6. Examples of predicted low-rank likelihood maps for images from GreatCourt (seq2) (top two rows) and OldHospital (seq1) (bottom two rows) and obtained with TILT+wKDE(second column), Segnet-based (third column) and FRNN-based (fourth column) deep network.

Potential applications

While TILT unable to detect low-rank regions, it can efficiently estimate a rectifying homography of the bounding boxes around the local maxima of our predicted likelihood maps. Such homographies have important applications, e.g.camera poseestimation, matching, and 3D reconstruction. An example of camera pose estimation w.r.t. a 3D plane is shown in Video 1.

Video 1. Presentation of the detected bounding box (light blue) used for relative pose estimation w.r.t. the 3D plane of the low-rank region. Green: GT camera. Blue: camera factorized from the rectifying homography. Rotation error: 2.4°, translation error: 1.5° (angle w.r.t. the GT translation because the absolute length of the translation cannot be obtained from an homography).
Publications to cite:
  1. Gabriela Csurka, Zoltan Kato, Andor Juhasz, Martin Humenberger, Estimating Low-Rank Region Likelihood Maps, In Proceedings of International Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, Washington, USA, pp. 1-10, 2020. [bibtex]

Hichem Abdellali has been awarded the Doctor of Philosophy (PhD.) degree...

2022-04-30


Hichem Abdellali has been awarded the KÉPAF Kuba Attila prize...

2021-06-24