Structure building

Posted 8th of Nov, 2024

Project Page Next project post Post History

Since returning from my trip away, I've become even more motivated to find a way to making my own film projects. Here, I'm picking up my Dark Room project and continuing with extending the camera functionality to try and reconstruct some of the tracked 2D points into a virtual 3D environment.

At this point, I've managed to track feature areas in the frame. I can also track and hone-in on particular areas of the image, such as checkerboard corner markers. Having markers like this becomes handy, because with enough of them, and with good track results across frames, we can start to use these markers to try and form some virtual geometry. Essentially, what I'm trying to do here is create some sort of known structure so that I can track the markers in relation to 3D coordinates, and thus produce a camera pose estimation. The more technical term for this kind of process is called structure from motion (SfM). It's actually what underpins photogrammetry. So, if I can do SfM, albeit in a fairly naive way, I may have the building blocks to do more photogrammetry-like things further down the track.

For now though, I'm only trying to retrieve simple structures. So, here we go!

To begin with, we need at least two reference frames with tracking markers. For this, I used a shot I made some time ago. But, I realised that the markers in the shot were really close to being co-planar. This means they all lie on the same plane. Unfortunately, algorithms won't account for this, your markers in the real world must not be on the same plane. So, I shot some footage again, this time with markers in non-planer positions. The two frames from the footage I'll be using are shown below.

Use the slider in the image comparison below to slide between the two reference frames.

In the image above, you can see the green tracking markers on the frames. We are going to use these two frames (frame 1 and 180) as our reference frames, because there should be enough movement in the camera to make a reasonable structure estimation from.

In my first tests, I started to display the ongoing code in a 3D environment, so I could visually gauge how things were progressing. In the first stages, I was trying to process tracking markers with unknown camera data. They call this structure from uncalibrated cameras. That is, we don't know the focal length or anything about the cameras. We only have 2D tracking markers. I fumbled with this in the early stages and had troubles getting any results.

Some early structure building with uncalibrated cameras

Eventually I succumbed to the uncalibrated algorithm, and instead tried to make one for calibrated cameras. That is; cameras with known focal length, sensor size etc. The reason for this change was because even though the camera I use has a manual cinema lens (and therefore no lens data is stored in the video file), I do know it's focal length and I can get the camera's sensor size. Once I converted to the calibrated ways, I started to get more recognisable results.

You can see in the image above, the two cameras and two sets of red and blue points are starting to come together. The points are the tracking markers for each of the two reference frames (reds frame 1, blues frame 180). I pressed on, and with a bit more work I managed to triangulate the points from the calculated camera positions. This gave some promising 3D point positions.

Use the slider below to see slide between the two example views:

And here's what it might look like with the two cameras:

The red axis points in the image above are the calculated 3D positions of the tracking markers, along with the two estimated camera positions and rotations.

And here's an image comparison of the points as depicted through the calculated camera positions. Once again, use the slider to go between the two virtual renders:

Comparing these results to the two original reference frames and the tracking markers in them, the results look pretty good.

I do know there's a little room for improvement here though. This bit may not make much sense for some, but I'll mention it anyway. Currently, when triangulating the points between the two cameras, I can print the distance between the intersecting ray lines and I can tell that the distance is a little bigger in some cases than it should be. But, I do have something up my sleeve here. The tracking markers I used have not been adjusted for lens distortion. I'm going to get a calibration chart I made printed in large scale so I can use it to calibrate for lens distortion. So, I'm fairly sure there's more precision to be found here. Fundamentally though, my structure engine for calibrated cameras appears to work.

I'll attempt to revive the process for uncalibrated cameras at some point. Even though it's better to have the camera information, there may be times when footage is shot on something I know nothing about. So, I'd like the flexibility to create structure without knowing the camera information as well, should a case present itself in the future.

From here, I need to add some functionality to help manage the 3D reconstruction a little better. I'd like to be able to move and rotate the points and cameras based on where I'd like to set the world origin, and which points lie on the ground surface for example. Calculating structure itself does not yield this kind of information. Though you can do some things if you have other data, like camera GPS locations etc. But I don't have that (for now). So, I need to build some additional rotate and translate functions to manipulate the points in the 3D virtual space - you know, so the points on the table top look like they are on a flat surface kind of thing!!