Visual Homing Project: A video panorama stitching method

I’ve been working on the Visual Homing project at IITB for more than two weeks, and have made significant progress as well. I’ve not been documenting stuff, which is rather unusual. Although, I regularly am uploading the codes on Github. Right now it is a private repository but I’d be making it public rather soon.

This post corresponds to an earlier portion of my work, in which I had thought about stitching a 360 degre panoramic image and then developing a pixel x-coordinate versus the rotation angle, and it pretty much does work assuming constant angular velocity while taking the video. For the project purposes, this approach is no longer relevant though.

For the stitching, the simplistic approach was:

  • Detect features using SURF algorithm, calculate feature vectors.
  • Match features between consecutive images, find regions of images that are common.
  • Compensate for changes in vertical direction.
  • Add common region from the left image along with the region not present in the right one, and then append the region not present in the left image to form the stitched image.

The algorithm does not take care of any sort of colour-tone matching or the change in lighting conditions.

The entire code written in C++ and OpenCV 2.4 is:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
#include "opencv2/opencv_modules.hpp"
#include "stdio.h"
#include "iostream"
#include "opencv2/core/core.hpp"
#include "opencv2/features2d/features2d.hpp"
#include "opencv2/highgui/highgui.hpp"
#include "opencv2/nonfree/features2d.hpp"
 
#define Y_THRESHOLD 5
#define DIV_FACTOR 2.7
 
using namespace cv;
using namespace std;
 
int distx,disty;
Mat prevframe,frame;
Mat img_1,img_2;
 
int main( int argc, char** argv )
{
  int iterator=0;
  VideoCapture cap("source.mp4"); 
  std::cout<<cap.get(CV_CAP_PROP_FRAME_COUNT);
  for(iterator=0;iterator<cap.get(CV_CAP_PROP_FRAME_COUNT);iterator++)
  {
    cap.read(frame);
    img_2=frame;
    if(img_1.empty())
    {
      prevframe=img_2;
    }
    img_1 = prevframe;
 
    int minHessian = 400;
    SurfFeatureDetector detector( minHessian );
    std::vector keypoints_1, keypoints_2;
 
    detector.detect( img_1, keypoints_1 );
    detector.detect( img_2, keypoints_2 );
 
 
    SurfDescriptorExtractor extractor;
    Mat descriptors_1, descriptors_2;
    extractor.compute( img_1, keypoints_1, descriptors_1 );
    extractor.compute( img_2, keypoints_2, descriptors_2 );
 
 
    FlannBasedMatcher matcher;
    std::vector< DMatch > matches;
    matcher.match( descriptors_1, descriptors_2, matches );
 
    double max_dist = 0; double min_dist = 100;
 
    for( int i = 0; i < descriptors_1.rows; i++ )
    { double dist = matches[i].distance;
      if( dist < min_dist ) min_dist = dist; if( dist > max_dist ) max_dist = dist;
    }
 
 
    std::vector< DMatch > good_matches;
 
    for( int i = 0; i < descriptors_1.rows; i++ )
    { if( matches[i].distance <= max(2*min_dist, 0.05) )
      { good_matches.push_back( matches[i]); }
    }
 
 
 
    int less_dist=0;
    Mat img1cut,img2cut;
    for( int i = 0; i < (int)good_matches.size(); i++ )
    { 
    	Point2f apt=keypoints_1[(int)good_matches[i].queryIdx].pt;
    	Point2f bpt=keypoints_2[(int)good_matches[i].trainIdx].pt;
 
 
    	distx=abs(apt.x-bpt.x);
    	disty=abs(apt.y-bpt.y);
 
    	if(disty<Y_THRESHOLD) { less_dist++; } if(less_dist>(int)good_matches.size()/DIV_FACTOR)
    	{
    		Rect img1roi(0,0,apt.x,img_1.rows);
    		img1cut=img_1(img1roi);
    		Rect img2roi(bpt.x,0,img_2.cols-bpt.x,img_1.rows);
    		img2cut=img_2(img2roi);
    		break; 
 
    	}
 
    }
 
    Size sz1 = img1cut.size();
    Size sz2 = img2cut.size();
 
 
    Mat result(sz1.height, sz1.width+sz2.width, CV_8UC1);
    Mat left(result, Rect(0, 0, sz1.width, sz1.height));
    img1cut.copyTo(left);
    Mat right(result, Rect(sz1.width, 0, sz2.width, sz1.height));
    img2cut.copyTo(right);
    imshow("left", left);
    imshow("right", right);
    Mat concat;
    hconcat(left,right,concat);
    imshow("result", concat);
    cout<<concat.cols<<endl;
    prevframe=concat;
 
     if(waitKey(30) == 27) 
       {
                break; 
       }
 
  }
 
}

The constants Y_THRESHOLD and DIV_FACTOR need to be tuned with relation to how shaky the video feed is. Very low values might give segmentation faults, I haven’t done anything to correct that because as mentioned earlier, I am no longer using this approach in my project. Too high values would give unconvincing results. The minHessian value can also be changed. ”

The result is stored in a Mat concat, which may be used for further processing or may be saved using imwrite.

Here are a few results.

Input video 1: (I trimmed the video as it was too long, also the few black frames at the beginning are a pain.)

Resultant stitched image 1:

winterday

Input Video 2:

Resultant stitched image 2:

labstitch

Input video 3:

Resultant stitched image 3: roomstitched

 

The occurrence of segmentation faults when using shaky videos is still an issue. The blurriness of the image is due to the rotation in the video being too fast for the camera, but the lines that appear are the algorithm’s fault.

I hope this code may find some use to someone.

One thought on “Visual Homing Project: A video panorama stitching method

Leave a Reply

Your email address will not be published. Required fields are marked *