A tale of optimization
I'm really shocked by how fast C++ can be. iOS and Android do not support GPU skinning, so I had to implement vertex-weighted skinning on the CPU. It took about a day to get running, and then I started optimizing code.
My test case was an 8400 polygon model. Each vertex could be attached to as many as four bones, but most just used two or three bones. To make it more interesting, I put the vertex weighting code inside a loop so it would be performed ten times instead of once.
When I started, the process took 23 milliseconds. I replace OO math code with procedural (including some inline functions), reduce the number of dynamically allocated objects, and make looping code as small as possible.
One interesting thing I did was merging four float variables in a loop into a single float array. Instead of resetting each variable to 0 in each iteration of the loop, I did a single memcpy() from an array of zeroes I created just for this purpose.
Before:
position[0]=0; position[1]=0; position[2]=0; sumweights=0;
After:
memcpy(position,nullarray,16);
This actually resulted in a very big speed increase!
The Result
By the time I was done, my stress test was executing in 4-5 milliseconds. The program now renders an 8400 skinned model at 1000 FPS, which I thought was impossible with CPU skinning.
We're still going to implement GPU skinning in our high-end PC renderer, but the CPU skinning will be good for mobile devices and older hardware.
- 1
6 Comments
Recommended Comments