Fast sparse stereo-matching [Computer Vision]
- FingerFlinger
- Sharp Eyed Eagle!
- Posts: 429
- Joined: Tue Feb 21, 2012 11:57 pm
- Location: Irvine, CA
Fast sparse stereo-matching [Computer Vision]
As part of a larger project, I made a little bare-bones library for stereo matching of sparse features. It's very basic, and completely tailored to my own needs, but I thought it was nifty, so I spent a little time this evening cleaning it up and threw it on GitHub. Binaries here
Based on various papers and snippets, it keeps up with or exceeds the speed and performance of similar sparse matchers, despite running on 7 year-old hardware. For comparison, LibViso2 claims that it can match 1000 features in 35ms. The image above was matched in 7.7ms and matched 253 features.
Put in equivalent terms:
LibViso2: 28571 per second
My thing: 32857 per second
But, that's tuned for rock-solid matches. If you relax the matching criteria slightly, you can still have over 90% matching accuracy but get 469 matches in 5.6ms, for ~82000/s.
There are a few major optimizations left, but I am saving them for later, since it suits my needs as-is. One funny thing is that the algorithm is almost completely memory-limited, so my netbook(w/ DDR3 memory) can actually run this about 20% faster than my "gaming pc" from 2008.
Last edited by FingerFlinger on Fri Jul 05, 2013 7:31 pm, edited 2 times in total.
- cybereality
- 3D Angel Eyes (Moderator)
- Posts: 11407
- Joined: Sat Apr 12, 2008 8:18 pm
-
- Golden Eyed Wiseman! (or woman!)
- Posts: 1498
- Joined: Fri Jul 08, 2011 11:47 pm
Re: Fast sparse stereo-matching [Computer Vision]
Very cool FingerFlinger.
-
- Certif-Eyed!
- Posts: 661
- Joined: Sun Mar 25, 2012 12:33 pm
Re: Fast sparse stereo-matching [Computer Vision]
What would a depth map composed of voronoi regions look like using these points?
Each region would be some grayscale color representing that point, and it might look interesting overlaid on the original image.
More points means better depth map!
By the way, is this used for you optical flow project? With my limited knowledge of optical flow algorithms, I imagine knowing depth would help immensely figuring out parallax and improving the accuracy of the track.
Each region would be some grayscale color representing that point, and it might look interesting overlaid on the original image.
More points means better depth map!
By the way, is this used for you optical flow project? With my limited knowledge of optical flow algorithms, I imagine knowing depth would help immensely figuring out parallax and improving the accuracy of the track.
- FingerFlinger
- Sharp Eyed Eagle!
- Posts: 429
- Joined: Tue Feb 21, 2012 11:57 pm
- Location: Irvine, CA
Re: Fast sparse stereo-matching [Computer Vision]
Actually, when I get a little time, I am going to add a dense option to the library. I was perhaps a little preemptive when I named the repo SparseStereo.
There is no reason that you couldn't attempt to find a match for every pixel in the image, but the reason that I haven't done so is because I am targeting a real-time application. In my demo, the default settings generate about 10000 keypoints between the 2 images. That's simply many fewer points than a dense map would need to match (375*450 = 168750px), and therefore a lot faster to compute.
Yes, it is used for my optical flow/stereo visual odometry project. It doesn't necessarily improve the quality of the actual optical flow, but it helps you figure out how much weight to give each flow vector. i.e. vectors with a greater depth will have a smaller magnitude than vectors that are nearer, but they represent the same amount of camera motion in real-world terms.
There is no reason that you couldn't attempt to find a match for every pixel in the image, but the reason that I haven't done so is because I am targeting a real-time application. In my demo, the default settings generate about 10000 keypoints between the 2 images. That's simply many fewer points than a dense map would need to match (375*450 = 168750px), and therefore a lot faster to compute.
Yes, it is used for my optical flow/stereo visual odometry project. It doesn't necessarily improve the quality of the actual optical flow, but it helps you figure out how much weight to give each flow vector. i.e. vectors with a greater depth will have a smaller magnitude than vectors that are nearer, but they represent the same amount of camera motion in real-world terms.
- Fredz
- Petrif-Eyed
- Posts: 2255
- Joined: Sat Jan 09, 2010 2:06 pm
- Location: Perpignan, France
- Contact:
Re: Fast sparse stereo-matching [Computer Vision]
You may already know about it but you can find a lot of good implementations of dense stereo correspondance here : http://vision.middlebury.edu/stereo/eval/FingerFlinger wrote:Actually, when I get a little time, I am going to add a dense option to the library. I was perhaps a little preemptive when I named the repo SparseStereo.
The second entry in the evaluation (ADCensus) is near real-time, it takes 0.1s on a GeForce GTX 480 with CUDA. I guess something simpler could be implemented a bit faster.FingerFlinger wrote:There is no reason that you couldn't attempt to find a match for every pixel in the image, but the reason that I haven't done so is because I am targeting a real-time application.
- brantlew
- Petrif-Eyed
- Posts: 2221
- Joined: Sat Sep 17, 2011 9:23 pm
- Location: Menlo Park, CA
Re: Fast sparse stereo-matching [Computer Vision]
Impressive. Keep it comin...
- FingerFlinger
- Sharp Eyed Eagle!
- Posts: 429
- Joined: Tue Feb 21, 2012 11:57 pm
- Location: Irvine, CA
Re: Fast sparse stereo-matching [Computer Vision]
Yep, I am aware of the Middlebury evaluations; the whole website is great!
I'm doing sparse because there is a lot of stuff to do with each point of interest. I also need to do optical flow and outlier rejection in addition to stereo correspondence. All of those steps add up, and I don't think it is practical to go completely dense for what I am trying to do.
Given that, I anticipate that my current performance is already good enough for my needs, and optimization will come after I have gotten the rest of my algorithm implemented.
I'm definitely interested to work with CUDA, since SSE has been pretty fruitful for me.
And thanks for pointing me specifically to that second link! I haven't seen their paper before, but their solution utilizes the Census transform, which is also what my library uses (albeit in a radically different way), so I might be able to gain some insight from their CUDA implementation.
I'm doing sparse because there is a lot of stuff to do with each point of interest. I also need to do optical flow and outlier rejection in addition to stereo correspondence. All of those steps add up, and I don't think it is practical to go completely dense for what I am trying to do.
Given that, I anticipate that my current performance is already good enough for my needs, and optimization will come after I have gotten the rest of my algorithm implemented.
I'm definitely interested to work with CUDA, since SSE has been pretty fruitful for me.
And thanks for pointing me specifically to that second link! I haven't seen their paper before, but their solution utilizes the Census transform, which is also what my library uses (albeit in a radically different way), so I might be able to gain some insight from their CUDA implementation.
-
- Certif-Eyed!
- Posts: 661
- Joined: Sun Mar 25, 2012 12:33 pm
Re: Fast sparse stereo-matching [Computer Vision]
I was trawling some ancient leap hacking threads, and I found this image:
Supposedly it's raw data from the leap's sensor (640x480 each). Despite the fact that each leap camera has ~150 degrees FoV (with substantial distortion), I'm curious as to how your stereo algorithms work on it.
The CTO of leap mentioned: "the lens distortions are nonlinear, non-monotonic, and fish-eye so they are very difficult to calibrate with existing vision packages. This is why we developed our own calibration method..." so it might be a lost cause in terms of creating dense stereo maps...
PS: If it does work, some guy made a raw data plug-in for Linux with some super sketchy code:
https://github.com/elinalijouvni/OpenLeap
Supposedly it's raw data from the leap's sensor (640x480 each). Despite the fact that each leap camera has ~150 degrees FoV (with substantial distortion), I'm curious as to how your stereo algorithms work on it.
The CTO of leap mentioned: "the lens distortions are nonlinear, non-monotonic, and fish-eye so they are very difficult to calibrate with existing vision packages. This is why we developed our own calibration method..." so it might be a lost cause in terms of creating dense stereo maps...
PS: If it does work, some guy made a raw data plug-in for Linux with some super sketchy code:
https://github.com/elinalijouvni/OpenLeap
- FingerFlinger
- Sharp Eyed Eagle!
- Posts: 429
- Joined: Tue Feb 21, 2012 11:57 pm
- Location: Irvine, CA
Re: Fast sparse stereo-matching [Computer Vision]
Not very well, I'd say!
EDIT: You can always do a manual calibration. OpenCV is great for that. I've got a half-finished tool that would automate it, actually. When life settles down a little bit, it's on the top of my list to finish...
- android78
- Certif-Eyable!
- Posts: 990
- Joined: Sat Dec 22, 2007 3:38 am
Re: Fast sparse stereo-matching [Computer Vision]
It looks like they probably use the brightness as a rough depth map. Using that they can use very localized matching between the two images to get the final position.
Basically, for a pixel, given the brightness you could estimate the distance being seen as an inverse square of the brightness (multiplied by scaling factor). This can be validated by the second image. Then you can do matching within a small region to get an even more accurate position.
Basically, for a pixel, given the brightness you could estimate the distance being seen as an inverse square of the brightness (multiplied by scaling factor). This can be validated by the second image. Then you can do matching within a small region to get an even more accurate position.
-
- Golden Eyed Wiseman! (or woman!)
- Posts: 1329
- Joined: Fri Jun 08, 2012 8:18 pm
Re: Fast sparse stereo-matching [Computer Vision]
android78 wrote:It looks like they probably use the brightness as a rough depth map. Using that they can use very localized matching between the two images to get the final position.
Basically, for a pixel, given the brightness you could estimate the distance being seen as an inverse square of the brightness (multiplied by scaling factor). This can be validated by the second image. Then you can do matching within a small region to get an even more accurate position.
I had figured it was going to be something along those lines, which make it somewhat similar to the original Kinect (more densely illuminated objects are closer). You can probably further deduce depth information by simple comparison of overlapped stereo pairs - the fact that it's monochromatic probably makes it much easier.
-
- Certif-Eyed!
- Posts: 661
- Joined: Sun Mar 25, 2012 12:33 pm