Animation Tweening
Leadwerks 5 uses a different engine architecture with a game loop that runs at either 30 (default) or 60 updates per second. Frames are passed to the rendering thread, which runs at an independent framerate that can be set to 60, 90, or unlimited. This is great for performance but there are some challenges in timing. In order to smooth out the motion of the frames, the results of the last two frames received are interpolated between. Animation is a big challenge for this. There could potentially be many, many bones, and interpolating entire skeletons could slow down the renderer.
In the screen capture below, I have slowed the game update loop down to 5 updates per second to exaggerate the problem that occurs when no interpolation is used:
My solution was to upload the 4x4 matrices of the previous two frames and perform the tweening inside the vertex shader:
//Vertex Skinning mat4 animmatrix[8]; for (int n=0; n<4; ++n) { if (vertex_boneweights[n] > 0.0f) { animmatrix[n] = GetAnimationMatrix(vertex_boneindices[n],0); animmatrix[n + 4] = GetAnimationMatrix(vertex_boneindices[n],1); } } vec4 vertexpos = vec4(vertex_position,1.0f); vec4 modelvertexposition; for (int n=0; n<4; ++n) { if (vertex_boneweights[n] > 0.0f) { modelvertexposition += animmatrix[n] * vertexpos * vertex_boneweights[n] * rendertweening + animmatrix[n+4] * vertexpos * vertex_boneweights[n] * (1.0f - rendertweening); } } modelvertexposition = entitymatrix * modelvertexposition;
Bone matrices are retrieved from an RGBA floating point texture with this function:
mat4 GetAnimationMatrix(const in int index, const in int frame) { ivec2 coord = ivec2(index * 4, gl_InstanceID * 2 + frame); mat4 bonematrix; bonematrix[0] = texelFetch(texture14, coord, 0); bonematrix[1] = texelFetch(texture14, coord + ivec2(1,0), 0); //bonematrix[2] = texelFetch(texture14, coord + ivec2(2,0), 0); bonematrix[2].xyz = cross(bonematrix[0].xyz,bonematrix[1].xyz); //removes one texture lookup! bonematrix[2].w = 0.0f; bonematrix[3] = texelFetch(texture14, coord + ivec2(3,0), 0); return bonematrix; }
This slows down the shader because up to 24 texel fetches might be performed per vertex, but it saves the CPU from having to precompute interpolated matrices for each bone. In VR, I think this cost savings is critical. Doing a linear interpolation between vertex positions is not exactly correct, but it's a lot faster than slerping a lot of quaternions and converting them to matrices, and the results are so close you can't tell any difference.
There's actually a similar concept in 2D animation I remember reading about.when I was a kid. The book is called The Illusion of Life: Disney Animation and it's a really interesting read with lots of nice illustrations.
Here is the same scene with interpolation enabled. It's recorded at 15 FPS so the screen capture still looks a little jittery, but you get the idea: Adding interpolation brought this scene down to 130 FPS from 200 on an Intel chip, simply because of the increased number of texel fetches in the vertex shader. Each character consists of about 4000 vertices. I expect on a discrete card this would be running at pretty much the max framerate (1000 or so).
With this in place, I can now confirm that my idea for the fast rendering architecture in Leadwerks Game Engine 5 definitely works.
The next step will be to calculate animations on a separate thread (or maybe two). My test scene here is using a single skeleton shared by all characters, but putting the animation on its own thread will allow many more characters to all be animated uniquely.
6 Comments
Recommended Comments