Computer Vision Headtracking Prototype

Post by **brantlew** » Sun Jan 13, 2013 4:27 pm

I have been researching positional tracking using computer vision techniques for the last couple of months and was finally able to get some preliminary results this weekend. In contrast to this excellent prototype http://www.mtbs3d.com/phpBB/viewtopic.php?f=138&t=16072 that employs an inward looking camera and markers, my approach uses an outward looking camera and no markers. So in theory, you could just mount a camera to your HMD and have head tracking in any location without preparation or boundaries. In practice, however it doesn't seem to work out quite that well, and it requires a great deal of bandwidth to run the stereo cameras and a lot of computer power to process the images. So it's not a perfect solution, but it's still an interesting area of research and potentially a part of the overall solution to the VR tracking problem. There is a lot more discussion of this technique in this thread http://www.mtbs3d.com/phpBB/viewtopic.php?f=138&t=15312

Here is a quick video demonstration.

[youtube-hd]http://www.youtube.com/watch?v=FmRu0dAw46Y[/youtube-hd]

Since there are no good consumer stereo cameras available for less than $1000, I had to build my own out of gutted PSEye cameras.

StereoCam_small.JPG

I am using the CL SDK for accessing the cameras, OpenCV for image preparation, and the libViso algorithm for computing the camera pose. The results of this early prototype are very rough. There is a good deal of noise in the output and the coordinate path does not align with the camera path perfectly. Only x,y,z position are being tracked now because any camera rotations really mess it up. Also there is a good deal of drift in the calculations. Much of this is probably due to poor camera calibration on my part (I'm still learning) and also I am not using synchronized cameras. So hopefully I can improve the results over time. Also, I plan to experiment with several other algorithms to compare performance and speed.

Overall, my impression is that while interesting - this is not going to be the right solution for general head tracking for the average user. I think "inward" optical techniques will require less processing power and offer a more stable solution in a small controlled environment. However, for large area tracking like lazer tag arenas or "unlimited" outdoor areas these techniques are very promising.

Post by **Fredz** » Sun Jan 13, 2013 5:24 pm

Nice ! Congrats for succeeding at implementing this, I'm sure it was not an easy task. I'd say the results are not bad, I really expected something worse to be honest.

I guess things like jitter and orientation tracking shouldn't be that hard to correct, but the main problem seems to be speed at this time. Maybe using custom optimized algorithms could limit the latency, but I'm not sure it's feasible in realtime without taxing the resources too much. A GPU implementation could probably also help, but that may be too taxing when running a game.

Can you give some more details about your implementation, ie. what kind of feature detector did you use, which algorithm for the feature correspondence, which algorithm for the camera parameter estimation, etc. ?

Post by **mahler** » Sun Jan 13, 2013 5:25 pm

Great work!

Really good to see some results.
Far from being perfect, but a big step forward.

Several things I'm interested in:

- What framerate and resolution is the camera recording? And how does this impact the bandwith / performance?
- How the distance between camera's influences performance
- How different filters / lenses influence the performance
- And ofcourse whether VSYNC will improve it

Can you overlay a 3D grid on the camera image to see how accurate it is?
And does the checker-pattern help?

Post by **cybereality** » Sun Jan 13, 2013 5:27 pm

This is looking good man!

Seems to work decent, but obviously can be improved. I think ultimately this is the kind of tracking that would be most useful for HMDs and VR. Marker based tracking is always going to be unwieldy and subject to line-of-sight style issues. Tracking like this could be used in any environment, so it has a lot of potential.

I did some work with OpenCV years ago (using the facial recognition, check here if interested) and I found the processing to be very slow. I was having trouble just getting a smooth 60FPS, even with such a simple 3D environment. How do you feel about the latency on this project?

Post by **PatimPatam** » Sun Jan 13, 2013 7:22 pm

Awesome job brantlew!! As Fredz mentioned, i think it works much better than expected!

One of my main concerns with this type of technique is actually related to line-of-sight problems.. is the algorithm you're using able to discriminate special cases where objects move inside the FOV of the camera? like if for example someone else moves nearby? or if you move your own arms while playing? or if there's a TV on?

Anyway very happy we're starting to get some real results.. Keep up the good work!

Post by **WiredEarp** » Sun Jan 13, 2013 8:35 pm

Nice work as usual Brantlew! I wonder if a combination of techniques might be good for things such as VR lasertag. For example, using outward facing cameras, together with an emitter (projectors?) that project a line or grid across the playfield. If you did this with say IR light, you could possibly use IR filters on your cameras, and simplify the tracking algorithms etc as there will be far less pixels etc to process?

One day, I could see the method you have posted as being a dominant tracking tech, but it sounds as though its a bit too intensive and unreliable currently. However, no markers etc is IMHO the way of the future.

Post by **brantlew** » Sun Jan 13, 2013 10:19 pm

Thanks for the encouragement guys.

@Fredz: What's amazing to me about this whole project is how little I actually understand it after implementing it. I did read a computer vision book to gather the vocabulary and the "big picture", and I read through the libViso paper just to understand the outline, but I did not work through all the math or retain all the details. The project had a steep learning curve, but in the end it was mostly a task of integrating a lot of free code and pre-existing modules. Luckily libViso operates more-or-less like a black-box engine. You just have to feed it stereo rectified images. So the project consisted of putting these three parts together
- camera interface: http://codelaboratories.com/research/vi ... cpp-sample
- stereo calibration and rectification: http://www.cse.iitk.ac.in/users/vision/ ... OpenCV.pdf Chapter 12 code examples
- libViso example code

@mahler: I tried really hard to use 320x240 images but the results were just too noisy. The algorithm doesn't converge well at that resolution. So I used 640x480 @ 30fps which works much better but is difficult to work with because the CL drivers or the cameras are a bit unstable at that resolution. I haven't had the time to adjust all the settings yet but I plan to test things like stereo separation, wide angle, and IR filters at some point. I don't know if the checkerboard helps, but the overall scene complexity is critical for the algorithm. If I move the camera down a few more inches so that it mostly is viewing underneath the table where there is little contrast, the algorithm is unable to converge to a solution and the tracking just stops.

@cyber: OpenCV is a mixed bag. I think some of the newer stuff like optical flow is probably not super optimized. But some of the core features that were hand optimized at Intel are incredibly fast! I know first-hand because I spent several weeks hand-crafting and optimizing a gaussian filter in C at my job. And I was pretty proud of it because it beat the Matlab implementation by about 4x. But then I ran mine against OpenCV and got blown out of the water by about 5x!! So for basic image processing OpenCV is pretty awesome.

@PatimPatam: From what I understand this particular algorithm (libViso) is designed to filter out independently moving objects in the scene. The examples show it operating on streets with other cars and pedestrians moving around. So moving objects are not the problem - but too much occlusion would be a problem because it needs a minimum number of correlation points in the scene. So someone walking 10 feet in front of you is probably no big deal. But moving your arm in front your face would be bad. I think every optical solution is going to require fusion with a back-up sensor that is not subject to occlusion.

@WiredEarp: I agree and I've been thinking about the same thing. Using a hybrid approach whereby you create a patterned environment to simplify the processing. It's similar to using markers, but it wouldn't require a carefully designed environment. So you could just paint the walls, floors, and ceiling with irregular checker patterns. Or maybe use laser emitters to paint designs on the walls. Anything to create visual richness, but simple or sparse enough that you could employ simple processing techniques. I'm very bullish about this idea.

Post by **MSat** » Mon Jan 14, 2013 4:23 pm

Very cool!

How this all works? I will never know..

Post by **FingerFlinger** » Mon Jan 14, 2013 4:53 pm

Nice work brantlew! It looks like I am still a few steps behind you. Have you benchmarked any of the components in LibViso (feature detection/optical flow/visual odometry)?

I've still been primarily messing with OpenCV, and I have a very Mickey-Mouse idea that I might post about soon, if the results aren't totally embarrassing.

Post by **STRZ** » Mon Jan 14, 2013 10:48 pm

Could those cameras be any useful for this project? http://techcrunch.com/2013/01/11/pairis ... 3d-camera/

No idea how much they'll cost though. Maybe it's possible to buy it bulk, without the enclosure/glasses.

Seems to be a new company, just found out googling because i couldn't accept the "there are no sub 1000$ stereo cameras" comment

Btw, nice approach, very humble to give insight of the developing process of such a important technology for VR, good luck

Post by **brantlew** » Mon Jan 14, 2013 11:48 pm

STRZ wrote:Could those cameras be any useful for this project? http://techcrunch.com/2013/01/11/pairis ... 3d-camera/

One problem might be latency. At least in the demo the performance seemed pretty bad.

Post by **STRZ** » Tue Jan 15, 2013 12:37 am

Possibly because it's a pre prototype version, they've announced the prototype for x-mas 2013.

Post by **nikotin77** » Tue Jan 22, 2013 5:06 am

Hello brantlew,

Can you tell me how you get that parameters for ps3eye cam:

Code: Select all

// calibration parameters for sequence 2010_03_09_drive_0019 
param.calib.f  = 645.24; // focal length in pixels
param.calib.cu = 635.96; // principal point (u-coordinate) in pixels
param.calib.cv = 194.13; // principal point (v-coordinate) in pixels

?

Post by **Flassan** » Wed Jan 23, 2013 3:45 am

brantlew wrote:Thanks for the encouragement guys.
@Fredz: What's amazing to me about this whole project is how little I actually understand it after implementing it. I did read a computer vision book to gather the vocabulary and the "big picture", and I read through the libViso paper just to understand the outline, but I did not work through all the math or retain all the details. The project had a steep learning curve, but in the end it was mostly a task of integrating a lot of free code and pre-existing modules.

@brantlew: I think you are far too modest about your ability. Its fantastic that you can combine other people's work and produce such great results, and its a particularly useful skill now that software projects are too complex for one person to write the whole thing. Very well done!

brantlew wrote:@WiredEarp: I agree and I've been thinking about the same thing. Using a hybrid approach whereby you create a patterned environment to simplify the processing. It's similar to using markers, but it wouldn't require a carefully designed environment. So you could just paint the walls, floors, and ceiling with irregular checker patterns. Or maybe use laser emitters to paint designs on the walls. Anything to create visual richness, but simple or sparse enough that you could employ simple processing techniques. I'm very bullish about this idea..

Re painting walls, have you come across Dazzle Camouflage? (see attachment) It was used in both world wars to combat the U-boat threat and was dreamt up by a navy officer and artist who had studied cubism. I have often wondered if it could be used to produce VR 'fashion' if it turns out that it improves camera NUI interfaces. In the war it was used to prevent the U-boats calculating speed and direction but in VR the opposite might be true. The clothing idea was given a little credence when I found this: http://people.csail.mit.edu/rywang/uppe ... _final.mp4
You may find his webpage interesting http://people.csail.mit.edu/rywang/

Post by **brantlew** » Thu Jan 24, 2013 9:47 am

nikotin77 wrote:Hello brantlew,

Can you tell me how you get that parameters for ps3eye cam:

Code: Select all

// calibration parameters for sequence 2010_03_09_drive_0019 
param.calib.f  = 645.24; // focal length in pixels
param.calib.cu = 635.96; // principal point (u-coordinate) in pixels
param.calib.cv = 194.13; // principal point (v-coordinate) in pixels

?

My first try at those values was based on some sketchy data I found on the Internet and some educated guesses.

Code: Select all

double focal_length_mm = 2.83;    // This was the best estimate I found on the Internet
double cmos_width_mm = 3.984;   // from datasheet http://www.zhopper.narod.ru/mobile/ov7720_ov7221_full.pdf
double cmos_pix_per_mm = horizontal_resolution / cmos_width_mm; 
double focal_length_pix = cmos_pix_per_mm * focal_length_mm;
double principal_point_x_pix = horizontal_resolution / 2.0;     // correct in theory, but probably slightly off in practice
double principal_point_y_pix = vertical_resolution / 2.0;       // correct in theory, but probably slightly off in practice

Later I used values that were calculated by the OpenCV calibration routines. The values fluctuate a bit on different
calibrations, so I haven't really nailed them down but I prefer to use these dynamic values over the hard-coded ones from above.

Code: Select all


cvStereoCalibrate(object_points,
                        left_image_points,
                        right_image_points,
                        point_counts,
                        left_intrinsic,        // camera matrix
                        left_distort,        // distance coefficients
                        right_intrinsic,        // camera matrix
                        right_distort,        // distance coefficients
                        image_size,
                        stereo_rotate,
                        stereo_translate,
                        NULL,
                        NULL,
                        cvTermCriteria(CV_TERMCRIT_ITER + CV_TERMCRIT_EPS, 100, 1e-5),
                        CV_CALIB_FIX_ASPECT_RATIO + CV_CALIB_ZERO_TANGENT_DIST +
                        CV_CALIB_SAME_FOCAL_LENGTH);


double focal_length_pix = CV_MAT_ELEM(*left_intrinsic, double, 0, 0);

// cvStereoRectify averages the principal points when using the CV_CALIB_ZERO_DISPARITY flag,
// so I shall do the same
double principal_point_x_pix = CV_MAT_ELEM(*left_intrinsic, double, 0, 2);
principal_point_x_pix += CV_MAT_ELEM(*right_intrinsic, double, 0, 2);
principal_point_x_pix /= 2;

double principal_point_y_pix = CV_MAT_ELEM(*left_intrinsic, double, 1, 2);
principal_point_y_pix += CV_MAT_ELEM(*right_intrinsic, double, 1, 2);
principal_point_y_pix /= 2;

Post by **nikotin77** » Fri Jan 25, 2013 2:21 am

Later I used values that were calculated by the OpenCV calibration routines. The values fluctuate a bit on different
calibrations, so I haven't really nailed them down but I prefer to use these dynamic values over the hard-coded ones from above.
Code: Select all
....

double focal_length_pix = CV_MAT_ELEM(*left_intrinsic, double, 0, 0);

// cvStereoRectify averages the principal points when using the CV_CALIB_ZERO_DISPARITY flag,
// so I shall do the same
double principal_point_x_pix = CV_MAT_ELEM(*left_intrinsic, double, 0, 2);
principal_point_x_pix += CV_MAT_ELEM(*right_intrinsic, double, 0, 2);
principal_point_x_pix /= 2;

double principal_point_y_pix = CV_MAT_ELEM(*left_intrinsic, double, 1, 2);
principal_point_y_pix += CV_MAT_ELEM(*right_intrinsic, double, 1, 2);
principal_point_y_pix /= 2;

Thanks a lot, brantlew!

Post by **tcboy88** » Wed Feb 06, 2013 9:16 pm

sorry for a noob question:
why don't use hardware IMU for head tracking instead of computer vision approach?

another question:
since you have 2 ps3eye mounted, why don't feed the SBS output into HMD, then overlay some 3D objects in real time for AR?

Post by **FingerFlinger** » Wed Feb 06, 2013 9:24 pm

tcboy88 wrote:sorry for a noob question:
why don't use hardware IMU for head tracking instead of computer vision approach?

IMUs are great, but alone, they aren't much good for position tracking. Visual odometry appears to be a good general purpose solution for headtracking, ultimately, but there is a ton of work left to do before it is consumer-ready.

tcboy88 wrote:another question:
since you have 2 ps3eye mounted, why don't feed the SBS output into HMD, then overlay some 3D objects in real time for AR?

You could do this, but in this case, the FOV of the PS3 Eyes does not match the FOV of the Rift. Furthermore, although that would result in something that looks like AR, there are still serious issues to overcome, on top of the issues that exist for VR. Michael Abrash has written several articles about it on his blog for Valve, but essentially, AR must be close to perfect tracking, with even lower latency than the Rift already achieves. And then, you probably need to fit all of the processing horsepower into a mobile device, because that will be the primary use-case.

Computer Vision Headtracking Prototype

Computer Vision Headtracking Prototype

Re: Computer Vision Headtracking Prototype

Re: Computer Vision Headtracking Prototype

Re: Computer Vision Headtracking Prototype

Re: Computer Vision Headtracking Prototype

Re: Computer Vision Headtracking Prototype

Re: Computer Vision Headtracking Prototype

Re: Computer Vision Headtracking Prototype

Re: Computer Vision Headtracking Prototype

Re: Computer Vision Headtracking Prototype

Re: Computer Vision Headtracking Prototype

Re: Computer Vision Headtracking Prototype

Re: Computer Vision Headtracking Prototype

Re: Computer Vision Headtracking Prototype

Re: Computer Vision Headtracking Prototype

Re: Computer Vision Headtracking Prototype

Re: Computer Vision Headtracking Prototype

Re: Computer Vision Headtracking Prototype