Building a Zero-Overhead Renderer
The Leadwerks 4 renderer was built for maximum flexibility. The Leadwerks 5 renderer is being built first and foremost for great graphics with maximum speed. This is the fundamental difference between the two designs. VR is the main driving force for this direction, but all games will benefit.
Multithreaded Design
Leadwerks 4 does make use of multithreading in some places but it is fairly simplistic. In Leadwerks 5 the entire architecture is based around separate threads, which is challenging but a lot of fun for me to develop. I worked out a way to create a command buffer on the main thread that stores a list of commands for the rendering thread to perform during the next rendering frame. (Thanks for the tip on Lambda functions @Crazycarpet) Each object in the main thread has a simplified object it is associated with that lives in the rendering thread. For example, each Camera has a RenderCamera object that corresponds to it. Here's how changes in the main thread get added to a command buffer to be executed when the rendering thread is ready:
void Camera::SetClearColor(const float r,const float g,const float b,const float a) { clearcolor.x = r; clearcolor.y = g; clearcolor.z = b; clearcolor.w = a; #ifdef LEADWERKS_5 GameEngine::cullingthreadcommandbuffer.push_back( [this->rendercamera, this->clearcolor]() { rendercamera->clearcolor = clearcolor; } ); #endif }
The World::Render() command is still there for conceptual consistency, but what it really does it add all the accumulated commands onto a stack of command buffers for the rendering thread to evaluate whenever it's ready:
void World::Render(shared_ptr<Buffer> buffer) { //Add render call onto command buffer GameEngine::cullingthreadcommandbuffer.push_back(std::bind(&RenderWorld::AddToRenderQueue, this->renderworld)); //Copy command buffer onto culling command buffer stack GameEngine::CullingThreadCommandBufferMutex->Lock(); GameEngine::cullingthreadcommandbufferstack.push_back(GameEngine::cullingthreadcommandbuffer); GameEngine::CullingThreadCommandBufferMutex->Unlock(); //Clear the command buffer and start over GameEngine::cullingthreadcommandbuffer.clear(); }
The rendering thread is running in a loop inside a function that looks something like this:
shared_ptr<SharedObject> GameEngine::CullingThreadEntryPoint(shared_ptr<SharedObject> o) { while (true) { //Get the number of command stacks that are queued CullingThreadCommandBufferMutex->Lock(); int count = cullingthreadcommandbufferstack.size(); CullingThreadCommandBufferMutex->Unlock(); //For each command stack for (int i = 0; i < count; ++i) { //For each command for (int n = 0; n < cullingthreadcommandbufferstack[i].size(); ++n) { //Execute command cullingthreadcommandbufferstack[i][n](); } } //Remove executed command stacks CullingThreadCommandBufferMutex->Lock(); int newcount = cullingthreadcommandbufferstack.size(); if (newcount == count) { cullingthreadcommandbufferstack.clear(); } else { memcpy(&cullingthreadcommandbufferstack[0], &cullingthreadcommandbufferstack[count], sizeof(sizeof(cullingthreadcommandbufferstack[0])) * (newcount - count)); cullingthreadcommandbufferstack.resize(newcount); } CullingThreadCommandBufferMutex->Unlock(); //Render queued worlds for (auto it = RenderWorld::renderqueue.begin(); it != RenderWorld::renderqueue.end(); ++it) { (it->first)->Render(nullptr); } } return nullptr; }
I am trying to design the system for maximum flexibility with the thread speeds so that we can experiment with different frequencies for each stage. This is why the rendering thread goes through and executes all commands an all accumulated command buffers before going on to actually render any queued world. This prevents the rendering thread from rendering an extra frame when another one has already been received (which shouldn't really happen, but we will see).
As you can see, the previously expensive World::Render() command now does almost nothing before returning to your game loop. I am also going to experiment with running the game loop and the rendering loop at different speeds. So let's say previously your game was running at 60 FPS and 1/3 of that time was spent rendering the world. This left you without about 11 milliseconds to execute your game code, or things would start to slow down. With the new design your game code could have up to 33 milliseconds to execute without compromising the framerate. That means your code could be three times more complex, and you would not have to worry so much about efficiency, since the rendering thread will keep blazing away at a much faster rate.
The game loop is a lot simpler now with just two command you need to update and render the world. This gives you a chance to adjust some objects after physics and before rendering. A basic Leadwerks 5 program is really simple:
#include "Leadwerks.h" using namespace Leadwerks; int main(int argc, const char *argv[]) { auto window = CreateWindow("MyGame"); auto context = CreateContext(window); auto world = CreateWorld(); auto camera = CreateCamera(world); while (true) { if (window->KeyHit(KEY_ESCAPE) or window->Closed()) return 0; world->Update(); world->Render(context); } }
This may cause problems if you try to do something fancy like render a world to a buffer and then use that buffer as a texture in another world. We might lose some flexibility there, and if we do I will prioritize speed over having lots of options.
Clustered Forward Rendering
Leadwerks has used a deferred renderer since version 2.1. Version 2.0 was a forward renderer with shadowmaps, and it didn't work very well. At the time, GPUs were not very good at branching logic. If you had an if / else statement, the GPU would perform BOTH branches (including expensive texture lookups) and take the result of the "true" one. To get around this problem, the engine would generate a new version of a shader each time a new combination of lights were onscreen, causing period microfreezes when a new shader was loaded. In 2.1 we switched to a deferred renderer which eliminated these problems. Due to increasingly smart graphics hardware and more flexible modern APIs a new technique called clustered forward rendering is now possible, offering flexibility similar to a deferred renderer, with the increased speed of a forward renderer. Here is a nice article that describes the technique:
http://www.adriancourreges.com/blog/2016/09/09/doom-2016-graphics-study/
This approach is also more scalable. Extra renders to the normal buffer and other details can be skipped for better scaling on integrated graphics and slower hardware. I'm not really targeting slow hardware as a priority, but I wouldn't be surprised if it ran extremely fast on integrated graphics when the settings are turned down. Of course, the system requirements will be the same because we need modern API features to do this.
I'm still a little foggy on how custom post-processing effects will be implemented. There will definitely be more standard features built into the renderer. For example, SSR will be mixed with probe reflections and a quality setting (off, static, dynamic) will determine how much processing power is used for reflections. If improved performance and integration comes at the cost of reduced flexibility in the post-process shaders, I will choose that option, but so far I don't foresee any problems.
Vulkan Graphics
The new renderer is being developed with OpenGL 4.1 so that I can make a more gradual progression, but I am very interested in moving to Vulkan once I have the OpenGL build worked out. Valve made an agreement with the developers of MoltenVK to release the SDK for free. This code translates Vulkan API calls into Apple's Metal API, so you basically have Vulkan running on Mac (sort of). I previously contacted the MoltenVK team about a special license for Leadwerks that would allow you guys to release your games on Mac without buying a MoltenVK license, but we did not reach any agreement and at the time the whole prospect seemed pretty shaky. With Valve supporting this I feel more confident going in this direction. In fact, due to the design of our engine, it would be possible to outsource the task of a Vulkan renderer without granting any source code access or complicating the internals of the engine one bit.
- 5
5 Comments
Recommended Comments