Mon 28 Feb 2011
Another year, another new project. Flex Mobile is well underway, and I’ve transitioned over to a group within Adobe working on gaming technologies. Of course, Adobe products, and Flash in particular, are heavily used in game development, and we’ve started to increase our focus on gaming over the past year. One of the first key technologies we’re delivering is a GPU-accelerated 3D API in the Flash Player codenamed “Molehill”, which enables incredibly beautiful and incredibly performant 3D content to be built using Flash. And just last weekend, we’ve put up our first public pre-release of Molehill as part of the Flash Player Incubator program on Adobe Labs.
Now, if you’re a hardcore 3D programmer, you’ll know exactly what to do with Molehill, and Thibault Imbert has a great introduction to the API for you. But if you’re anything like me, and your development experience has been in the world of 2D graphics or UI, you might find even this introductory material pretty head-scratching. Vertex and fragment shaders? Index buffers? Assembly language? LOLWUT?
I’ve just recently been learning more about GPU-based 3D programming myself, so I thought I’d try to make a molehill out of the 3D development mountain, and write an introduction to what this stuff is all about for those of us who are coming from the 2D world. In this first post, I’ll generally describe how modern GPUs work. I’m planning to write a follow-up post with more detail on how you actually work with the GPU for 3D graphics, and then another follow-up on how you can leverage the GPU for incredibly fast 2D graphics as well.
One caveat—I may say a few things that aren’t strictly true, mostly because I might be deliberately oversimplifying, but also because I might just be ignorant. Overall, I don’t think this picture of the world is too misleading, but please feel free to correct me in the comments.
Update: One fundamental point I meant to make when I originally wrote this post, but forgot to add, is that Molehill rendering is completely separate from display list rendering. All of the drawing that Molehill does basically ends up in a single layer that essentially draws into the background behind all of your display list content—the two don’t interact at all. I’ll discuss how you can leverage Molehill for 2D in a future post.
GPUs are scary and complicated!
That was certainly what I assumed before I started reading up on them. It’s true that developing high-quality performant 3D renderers using a GPU requires a lot of cleverness and math. But it turns out that you can describe what modern (programmable) GPUs do very simply:
- GPUs draw triangles really, really fast.
- GPUs do basic matrix and vector arithmetic really, really fast.
- It’s up to you to tell the GPU exactly what to do with those capabilities.
That’s pretty much it. At the end of the day, modern GPUs are basically just really fast at drawing triangles and doing certain kinds of math. So fast that, unlike the Flash-style rendering you’re used to, you don’t program it like a display list, with minimal updates when things change. Since even a tiny change in your camera location or viewing angle means that all of the objects in your world are in slightly different places on the screen and with slightly different perspective relative to your view, you generally just tell the GPU to redraw the whole screen every frame. (Of course, there’s lots of optimizations you should do to make this work faster…but fundamentally you’re redrawing a lot more stuff than you would normally think of doing in the CPU-based 2D world.)
So what’s the big deal? I can draw a triangle really fast too.
The real power of the GPU is in how you tell it to draw those triangles and what math you tell it to do. Conceptually, the model (for programmable GPUs, which is what Molehill supports) is actually pretty simple, which is why the Molehill API only has a couple of dozen methods. It’s using the model effectively that gets complicated, as we’ll see later.
Here’s the basic way you work with a programmable GPU for a standard 3D rendering scenario. In each frame:
- Triangles. Send the GPU a bunch of triangles, expressed in terms of their vertices, as well as other associated data like texture coordinates (which map vertices to locations in a texture bitmap).
- Textures. Send the GPU a bunch of bitmaps to be used for textures that will be mapped onto triangles in step 4 below.
- Vertex shader. Upload a “vertex shader” program, which does some math on each vertex from step 1 in order to produce a final vertex position, as well as other optional outputs that are up to you.
- Fragment shader. Upload a “fragment (or pixel) shader” program. The GPU does built-in math to interpolate between the vertices on each triangle so it can call your pixel shader on each pixel in each triangle. Your shader does some math on each pixel, accessing textures from step 2 as necessary, in order to figure out what color that pixel should be.
- Z-buffer. The GPU has a “Z-buffer” that stores the depth of each pixel it draws on the screen. If a pixel from a new triangle would be behind the last pixel that was written to the same screen location, it doesn’t redraw the pixel.
- Lather, rinse, repeat. Repeat the above steps one or more times per frame.
- Present. Once you’re done with all your drawing for the frame, you swap the buffer the GPU has been drawing to out to the screen, and it’s presented to the user. Now you’re ready to clear the buffer and start all over again for the next frame.
And that’s it. Simple, right?
Wait a minute. What about colors, curved surfaces, lights, reflections, cameras, perspective projection, and so on?
The GPU doesn’t know anything about any of that. This is the fundamental difference between older, “fixed-function” GPUs and modern programmable GPUs: the GPU makes very few assumptions about the actual meaning of all the data it’s processing. It knows about vertices, triangles, and pixels (or “fragments”, which are so-called because the edges of the triangle divide the pixels they cross into fragments). It also knows how to sample locations from texture bitmaps. Everything else—all the computation that actually decides which triangles to display, where the triangles should end up relative to the camera, or what colors the pixels should be given the current lights and textures—is either done by your application code on the CPU before you start uploading stuff to the GPU, or by the vertex and fragment shader programs that you upload to the GPU.
This means that you have complete power over how the GPU processes your geometry and textures. But with great power comes great responsibility—and a lot of coding, both in your application and in the shader programs you have to write to get anything to display.
Fortunately, you won’t have to write all that code yourself. Many members of the Flash community have written 3D frameworks and are now porting them to the Molehill API—including Alternativa, Away3D, Flare3D, Minko, Sophie3D, Yogurt3D, and more. In these frameworks, you don’t deal directly with the GPU, shaders, etc. at all. Instead, you typically interact with a “scene graph”, which is analogous to the Flash display list. It’s a persistent tree of objects that you can add and remove and set properties on. You just specify what geometry you want, how it should look, and where it should go in the world, and the framework takes care of sending it to the GPU and providing the appropriate shaders.
If the world of GPU programming doesn’t look like it’s for you, stop reading right now and go check out all those great frameworks. If you feel like digging in at a lower level, though, check out my next blog post, which will get into more details that bridge the gap between the high-level GPU description I gave above and Thibault’s API introduction.