Camera pose estimation solution

Project Page Next project post Post History

For the past couple of months I've been working on trying to implement a camera pose solution into The Dark Room program. I was anticipating it taking several months (at least!), but I've managed to make a working version in six weeks. There was a bit of fumbling around I admit, but I managed to get something working.

For those that don't know, finding the camera pose is about reconstructing where the real camera is in a virtual space. You've probably seen Hollywood films using special effects using this technique in some way, think the Avengers. They take a film camera (not unlike mine), and film on location. They then take this back and give the footage to a camera tracking artist whose job it is to replicate the real moving camera, in a virtual environment. They need to do this so that when they render out the 3D animation and effects, it looks like it was shot from the same camera position as the real world camera.

The video below (not mine) shows an example architecture being replaced by a 3D model building. In order to put the 3D building into the video, they would have tracked the footage and made the 3D camera move like the real drone camera up in the air. They track the video frame by frame to calculate the camera position.

Even though there is commercial software that can do this for you, there's a few reasons for wanting to make my own tracking engine. Firstly, the good tracking software is really expensive, though there are more reasonably priced ones. There are also free ones, which, while they do a job, I don't feel you get the same control as in the paid for ones. And having my own engine that works inside my 3D package with the raw camera image, means I don't have to duplicate images by exporting image sequences to another software package. Which in turn, saves disk space.

Getting the camera pose estimation to work has been quite a big deal. It was also a bit of a juggling act between figuring out how to do it, and retro-fitting my way of coding over examples I saw elsewhere. Most code I see is either written in other languages, uses external libraries, or is written in a white paper I find hard to follow. I understand the concepts, but don't follow the mathematical formulas the way they write them. So I had to do a fair bit of 'reverse engineering' so to speak. But it worked.

I'm kind of excited by this one.

To test my code, I ran a simulation on a known set of point coordinates and world locations, taken from a simple cube. This enabled me to test how close my calculated camera is to the actual position. The image comparison below shows how close it is (drag the slider to see). The left side shows the known camera position (light green), and the right side image shows the estimated/calculated camera position (orange).

I'm really wrapped that I've been able to figure this out. This is what I'm hoping will connect the film camera images to the 3D animation. I still need to deploy some kind of 2D tracking, but I already have an engine for this, which has been tested outside of this program. I should only need to bring that in and then we should almost be done.

I say almost, because I still need to make an interface so I can use it (working on it as I go). I also need to figure out how to do some sort of camera image calibration. We need to do calibration in order to figure out things like lens distortion, so that we can make adjustments to the calculations. Lens distortion is what makes straight lines in your video look curved. This affects your tracking data. I have some ideas on how to tackle this, but I won't tackle this until I've added the 2D tracking.

By the way, the 2D tracking is what leads onto the 3D camera solve. Your real world camera moves, so I need to track markers across multiple frames. Hence why I need the 2D tracking as well.

Below is a rendered image comparison showing the image coordinates of the known point positions and the re-projected image point positions. The point of this is to re-project the known world points using your calculated perspective projection to see how the points lie on the final image. It's a good way to compare where the original known points sit, versus where your calculated positions are.

Use the image slider below to compare the known points versus the estimated points. The markers may be a little hard to see, they're on the corners of the cube. The markers in green on the left-side image are the original point coordinates (known), and the red markers in the right-side image are my re-projected markers rendered from my calculated camera position (estimated).

I'm pretty happy with the result!

...Actually, I'm quite wrapped with this. I can use it. Things are coming along.