Using computers to create 3d worlds for games and movies is everywhere, but how does it work? A primer about 3d rendering.
Let’s start with some basic about real-life 3d. Look at the picture below.
This is a straight bike lane running below a bridge. Note, though, that the road lines do not run parallel to each other. The road is wide at the bottom, and narrow at the top. That’s the effect of perspective. Things close by are bigger, and things far away are smaller. Which results in the non-straight lines.
Except for the bridge, which is running perpendicular to the viewer, meaning it’s all at the same distance, hence: still straight.
This is called perspective projection, and artists use it all the time.
So, how does this work in a computer?
In a computer, each 3d objects is represented by a set of points called ‘vertices’. These vertices are combined into planes called ‘faces’, and those are colored and lighted. But let’s not get ahead of ourselves.
For each 3d image a computer renders, the software takes the vertices of all 3d objects, and projects them, using perspective projection. Most software uses a static 3d world, and puts a camera in there that points a certain way. It then translates vertices so that the user gets a 2d picture as if they were looking through the camera.
Look at the picture below.
On the left you see a 3d world with a camera, and on the right the projected 2d image. (Of course, the image on the left is also a 3d projection, but you get the idea).
The computer mathematically translates each 3d vertex to a 2d coordinate on a 2d screen, where points further away come closer together. The software uses matrix calculations to do all this. I’ll not go into the math here, but this is done for all the visible vertices, of all the objects, in each 3d scene. For a video game that runs at 60 frames per second, with tens of thousands of points, this means tens of thousands of calculations 60 times per second. That’s quite a lot.
Alright, but the above process only gives us a large number of points translated to the correct place. It doesn’t look very realistic. In 3d images we have walls, wooden doors, shiny guns, and whatever.
To achieve that, 3d objects are ‘textured’. Every 3d object consists of points, and as I wrote above, those points form faces. The software breaks each 3d object down into a large number of triangles. This applies to cubes, like the one shown above, but also to globes, and people, and their clothes.
Why triangles? Because triangles are convex, which makes them easy to draw. Convex means that any line you draw between points inside the triangle will stay fully inside the triangle. And that means you don’t have to worry about weird edge cases.
So a computer draws triangles to create 3d objects, between the projected points. And each of these triangles is filled with a picture: a texture. See the image below.
The image is of a triangle with a brick texture. Of course, this looks weird. That’s because the texture has not been perspective corrected. The triangles usually have points farther away from the camera, and points closer to the camera. The computer does this to the texture as well.
I’ll not go into details, but texturing realistically is quite a complicated process. It results in something like this though:
Now that’s started to look a little more 3d. I’m skipping over quite a bit of technical details, but the gist of it is that the computer creates a 2d image made of perspective-corrected triangles to create a scene. However, there’s still one more step.
If you create a scene as described above, you end up with a 3d scene rendered in 2d. However, it will look horrible. That’s because everything has the same brightness.
In real life, everything is lit by the sun, and lamps, and reflections of the sun and lamps. Things cast shadows. Light passing through partially transparent liquids causes colored shadows. And so on, and so forth.
The final step in the 3d rendering process is figuring out how to light it. And that is, in fact, very hard. Because light bounces off things in complicated ways. A shadow is not sharp, but fuzzy, because light bounces off at angles. When you put a light brown table next to a white wall, a tiny amount of brown light bounces from the table to the wall. Lamps reflect in people’s eyes, and you can see reflections of people in glass windows, also depending on how bright the outside is.
Let’s leave aside for a moment how to determine all that lighting, but assume we can figure that out. The computer can then draw brightness onto each of the triangles, which results in something like this.
That looks better already. But we skipped over a crucial thing. How to figure out that lighting to render?
Creating proper lighting is complicated. The best technique we have is to use ray tracing. When using ray tracing, you send virtual rays from the camera outwards, and see which triangles they run into. The computer performs a calculation about how all the light sources in the surroundings affect that spot, and that determines the color and brightness. If the triangle that was hit has reflective properties, the ray has to continue to check the reflection.
That sounds doable, right?
Oops, but a surface is not fully reflective, and almost never mirror smooth. That means scattering occurs. In practice, for each points in a 2d image, the computer needs to check a number of rays at slightly different angles and averages the results. Those rays can also bounce and lead to new rays. Rinse and repeat. It quickly becomes rays galore. And that is no longer feasible to do 60 times per second.
Heck, a 3d scene rendered with high quality ray tracing can take hours to complete. Imagine creating a 2 hour movie with 24 frames per second that way. But, yeah, that is exactly how movies like The Super Mario Bros Movie are made.
But, because games need to render at some 60 frames per second or more, they use a different approach.
Light maps and light probes
Video games, and other 3d rendering that can’t take hours, use tricks to simulate proper lighting. One of the most common ones is using light maps and light probes.
A light map is a texture, like the brick wall above, but a texture with brightness information. The computer generates this light map beforehand and adds it as a texture to the finished product. That can take a long time to create, but in the actual game, it’s just another texture.
However, light maps only work on static scenes. And games feature scenes with non-static things. The player character moves around, and so do other people that populate a game. Light probes handle that part.
Light probes are invisible objects placed in the 3d environment. They’re points in the 3d world that are not rendered or visible, but the lighting conditions at those points are also precalculated and stored. So, the actual video game can find the light probe closest to a dynamic triangle it needs to render, and apply that lighting. That leads to an approximation of the actual light, good enough to fool our eyes.
The computer applies more trickery, but this is the basis. You can imagine this approach leads to complications when you have lights that turn off and on, or for things like headlights on a moving 3d vehicle. But it works, and works better with each new generation of video card.
There you have it. The basics of how a computer renders a 3d environment. I’ve glossed over a huge number of tricks and special cases, and left out all the math, but I hope this gives you some idea of what happens under the hood in 3d software like Blender, in video games, and when creating 3d movies.