Visual Homing Project: A video panorama stitching method

I’ve been working on the Visual Homing project at IITB for more than two weeks, and have made significant progress as well. I’ve not been documenting stuff, which is rather unusual. Although, I regularly am uploading the codes on Github. Right now it is a private repository but I’d be making it public rather soon.

This post corresponds to an earlier portion of my work, in which I had thought about stitching a 360 degre panoramic image and then developing a pixel x-coordinate versus the rotation angle, and it pretty much does work assuming constant angular velocity while taking the video. For the project purposes, this approach is no longer relevant though.

For the stitching, the simplistic approach was:

Detect features using SURF algorithm, calculate feature vectors.
Match features between consecutive images, find regions of images that are common.
Compensate for changes in vertical direction.
Add common region from the left image along with the region not present in the right one, and then append the region not present in the left image to form the stitched image.

The algorithm does not take care of any sort of colour-tone matching or the change in lighting conditions.

The entire code written in C++ and OpenCV 2.4 is:

#include "opencv2/opencv_modules.hpp"
#include "stdio.h"
#include "iostream"
#include "opencv2/core/core.hpp"
#include "opencv2/features2d/features2d.hpp"
#include "opencv2/highgui/highgui.hpp"
#include "opencv2/nonfree/features2d.hpp"
 
#define Y_THRESHOLD 5
#define DIV_FACTOR 2.7
 
using namespace cv;
using namespace std;
 
int distx,disty;
Mat prevframe,frame;
Mat img_1,img_2;
 
int main( int argc, char** argv )
{
  int iterator=0;
  VideoCapture cap("source.mp4"); 
  std::cout&lt;&lt;cap.get(CV_CAP_PROP_FRAME_COUNT);
  for(iterator=0;iterator&lt;cap.get(CV_CAP_PROP_FRAME_COUNT);iterator++)
  {
    cap.read(frame);
    img_2=frame;
    if(img_1.empty())
    {
      prevframe=img_2;
    }
    img_1 = prevframe;
 
    int minHessian = 400;
    SurfFeatureDetector detector( minHessian );
    std::vector keypoints_1, keypoints_2;
 
    detector.detect( img_1, keypoints_1 );
    detector.detect( img_2, keypoints_2 );
 
 
    SurfDescriptorExtractor extractor;
    Mat descriptors_1, descriptors_2;
    extractor.compute( img_1, keypoints_1, descriptors_1 );
    extractor.compute( img_2, keypoints_2, descriptors_2 );
 
 
    FlannBasedMatcher matcher;
    std::vector&lt; DMatch &gt; matches;
    matcher.match( descriptors_1, descriptors_2, matches );
 
    double max_dist = 0; double min_dist = 100;
 
    for( int i = 0; i &lt; descriptors_1.rows; i++ )
    { double dist = matches[i].distance;
      if( dist &lt; min_dist ) min_dist = dist; if( dist &gt; max_dist ) max_dist = dist;
    }
 
 
    std::vector&lt; DMatch &gt; good_matches;
 
    for( int i = 0; i &lt; descriptors_1.rows; i++ )
    { if( matches[i].distance &lt;= max(2*min_dist, 0.05) )
      { good_matches.push_back( matches[i]); }
    }
 
 
 
    int less_dist=0;
    Mat img1cut,img2cut;
    for( int i = 0; i &lt; (int)good_matches.size(); i++ )
    { 
    	Point2f apt=keypoints_1[(int)good_matches[i].queryIdx].pt;
    	Point2f bpt=keypoints_2[(int)good_matches[i].trainIdx].pt;
 
 
    	distx=abs(apt.x-bpt.x);
    	disty=abs(apt.y-bpt.y);
 
    	if(disty&lt;Y_THRESHOLD) { less_dist++; } if(less_dist&gt;(int)good_matches.size()/DIV_FACTOR)
    	{
    		Rect img1roi(0,0,apt.x,img_1.rows);
    		img1cut=img_1(img1roi);
    		Rect img2roi(bpt.x,0,img_2.cols-bpt.x,img_1.rows);
    		img2cut=img_2(img2roi);
    		break; 
 
    	}
 
    }
 
    Size sz1 = img1cut.size();
    Size sz2 = img2cut.size();
 
 
    Mat result(sz1.height, sz1.width+sz2.width, CV_8UC1);
    Mat left(result, Rect(0, 0, sz1.width, sz1.height));
    img1cut.copyTo(left);
    Mat right(result, Rect(sz1.width, 0, sz2.width, sz1.height));
    img2cut.copyTo(right);
    imshow("left", left);
    imshow("right", right);
    Mat concat;
    hconcat(left,right,concat);
    imshow("result", concat);
    cout&lt;&lt;concat.cols&lt;&lt;endl;
    prevframe=concat;
 
     if(waitKey(30) == 27) 
       {
                break; 
       }
 
  }
 
}

#include "opencv2/opencv_modules.hpp" #include "stdio.h" #include "iostream" #include "opencv2/core/core.hpp" #include "opencv2/features2d/features2d.hpp" #include "opencv2/highgui/highgui.hpp" #include "opencv2/nonfree/features2d.hpp" #define Y_THRESHOLD 5 #define DIV_FACTOR 2.7 using namespace cv; using namespace std; int distx,disty; Mat prevframe,frame; Mat img_1,img_2; int main( int argc, char** argv ) { int iterator=0; VideoCapture cap("source.mp4"); std::cout<<cap.get(CV_CAP_PROP_FRAME_COUNT); for(iterator=0;iterator<cap.get(CV_CAP_PROP_FRAME_COUNT);iterator++) { cap.read(frame); img_2=frame; if(img_1.empty()) { prevframe=img_2; } img_1 = prevframe; int minHessian = 400; SurfFeatureDetector detector( minHessian ); std::vector keypoints_1, keypoints_2; detector.detect( img_1, keypoints_1 ); detector.detect( img_2, keypoints_2 ); SurfDescriptorExtractor extractor; Mat descriptors_1, descriptors_2; extractor.compute( img_1, keypoints_1, descriptors_1 ); extractor.compute( img_2, keypoints_2, descriptors_2 ); FlannBasedMatcher matcher; std::vector< DMatch > matches; matcher.match( descriptors_1, descriptors_2, matches ); double max_dist = 0; double min_dist = 100; for( int i = 0; i < descriptors_1.rows; i++ ) { double dist = matches[i].distance; if( dist < min_dist ) min_dist = dist; if( dist > max_dist ) max_dist = dist; } std::vector< DMatch > good_matches; for( int i = 0; i < descriptors_1.rows; i++ ) { if( matches[i].distance <= max(2*min_dist, 0.05) ) { good_matches.push_back( matches[i]); } } int less_dist=0; Mat img1cut,img2cut; for( int i = 0; i < (int)good_matches.size(); i++ ) { Point2f apt=keypoints_1[(int)good_matches[i].queryIdx].pt; Point2f bpt=keypoints_2[(int)good_matches[i].trainIdx].pt; distx=abs(apt.x-bpt.x); disty=abs(apt.y-bpt.y); if(disty<Y_THRESHOLD) { less_dist++; } if(less_dist>(int)good_matches.size()/DIV_FACTOR) { Rect img1roi(0,0,apt.x,img_1.rows); img1cut=img_1(img1roi); Rect img2roi(bpt.x,0,img_2.cols-bpt.x,img_1.rows); img2cut=img_2(img2roi); break; } } Size sz1 = img1cut.size(); Size sz2 = img2cut.size(); Mat result(sz1.height, sz1.width+sz2.width, CV_8UC1); Mat left(result, Rect(0, 0, sz1.width, sz1.height)); img1cut.copyTo(left); Mat right(result, Rect(sz1.width, 0, sz2.width, sz1.height)); img2cut.copyTo(right); imshow("left", left); imshow("right", right); Mat concat; hconcat(left,right,concat); imshow("result", concat); cout<<concat.cols<<endl; prevframe=concat; if(waitKey(30) == 27) { break; } } }

The constants Y_THRESHOLD and DIV_FACTOR need to be tuned with relation to how shaky the video feed is. Very low values might give segmentation faults, I haven’t done anything to correct that because as mentioned earlier, I am no longer using this approach in my project. Too high values would give unconvincing results. The minHessian value can also be changed. ”

The result is stored in a Mat concat, which may be used for further processing or may be saved using imwrite.

Here are a few results.

Input video 1: (I trimmed the video as it was too long, also the few black frames at the beginning are a pain.)

Resultant stitched image 1:

Input Video 2:

Resultant stitched image 2:

Input video 3:

Resultant stitched image 3:

The occurrence of segmentation faults when using shaky videos is still an issue. The blurriness of the image is due to the rotation in the video being too fast for the camera, but the lines that appear are the algorithm’s fault.

I hope this code may find some use to someone.

Pingback: Visual Homing Project: Methodology and Results – Siddharth Jha