In the last post, I talked about improving the Hyperion render engine speed by using some acceleration structures. Since yesterday, I've managed to speed up the engine even further by optimising the object trees and making components more efficient to compute.
For reference, below is our benchmark rendered Stanford Dragon image again, with 47,768 polygons. This render is done using the 'brute force' method. Remember, we're trying to improve on the render times, not image quality (at least not yet).
Here's the same render, but with the new object acceleration trees:
This makes about a 13.9x improvement over the brute force method. I then set about trying to make the structure more efficient, and managed to get the image below in 53.5 seconds.
That's another 10% improvement. Now we're at a 15.4x speedup.
From here, I changed the way I tested objects for ray hits and cut out further unnecessary intersection tests. I continued to cut back where I could, and got render times down to 27.9 seconds:
And finally, I noticed that the root node was still housing a lot of polygons. I believe this happens when some of the polygon's points sit outside the node boundary. I decided to try split the root node up further, by deploying node/branch splits with a slightly different type of bounding box algorithm. This meant that points outside the initial node boundaries, may in fact fall inside the slightly different algorithm boundaries. For the Dragon model, this did seem to move more polygons inside the tree. Which means there's not as many (or no) polygon testing on the root node. This provided an even better speed drop, here's a pass to show it:
The comparison image above with the normals is done to show that the object is still being rendered. The render just looks dark because I haven't fixed colour information yet.
Finally, I turned to the bounding box intersection test function. I was able to squeeze a little more out by building a better slab test function. This provided another small drop in render time, down to 20.16 seconds. As it stands, this equates to a 40.9x speedup. From just under 14 minutes for the brute force test, down to 20.1 seconds with object acceleration structures. This is a really good result.
Here's what I did
I started playing around with dividing the object's bounding box. Let's call the structure a tree, with the top level box the root node. All branches are nodes, and leafs are the end node of a branch. Think of something like this:
I continued to divide the nodes (branches) further, until I reached a certain tree depth. let's say 3. All the tree nodes were spatially divided into their own little region. Kind of like a grid. I then went through each polygon and fitted it to a node.
With some additional functions, I also automated the building of the tree structure, depending on the number of polygons in a given object, and how many are passed down through a branch. As these were passed down through the branches, if the node contained (for example) more than 20 polygons, I ran that node through the divider again.
Finally, I iterated through the tree and globalised the data, and removed any empty nodes to get rid of branches with no value.
Then, I spent some time adjusting my ray intersection test functions. Mostly because it needed to depth test against the tree root, branch and leaf nodes. It's really paid dividends, the rendered times have been significantly reduced. Particularly for polygon heavy models.
So far all this has been a huge success, but there was more to squeeze out. I noticed that the root node was still housing quite a number of polygons, which in the case of the Dragon model, was still making the initial tree intersection test slow. The way I built the acceleration structure meant some polygons on borders of node bounding boxes were not considered. It left quite a number of polygons in the root node's data. So I made another tree branching algorithm, that runs after the first initial tree build. It differs from the first type in that the node boundaries are set differently. This means I can catch polygons that didn't process through the first set of tree branches, and re-test them with a series of branches that have different boundaries. This pushed the top level polygons out of the root node and further into the tree, and really helped bring the render times down even further. I suspect with a big more code juggling and bounding box design, I could make this more efficient as well.
As a final resort, I re-did the bounding box intersection test, and replaced it with a more up to date slab test. This did squeeze a little more out of the dragon render. On more polygon heavy scenes, it might even be more beneficial.
However, there are some limitations I found. If your object hasn't got many polygons, it's not really worth building a tree structure. There comes a point where there's nothing to gain over just using the brute force method on an object. Which is a bummer if the object is simple, like a thin cylinder that stretches over a big area. But if it's better to brute force, then brute force it has to be.
The acceleration structure has been a really good success. And while I may not be as skilled a programmer as a seasoned computer scientist, who could possibly squeeze out better render times with some code trickery, I'm really happy with the results. It's in a place speed-wise where I could use it for something now. A big box ticked here.
I'll try work on the image colours and the object materials next. I've had some troubles with the textures and colours, can't quite put my finger on why. There's an issue with HDRI maps not rendering either (they were previously). So not sure what's happened there. Once I've resolved some of these, I'll render out a better looking image and make a post for it.