Jump to content

Vulkan Dynamic Rendering


Josh

7,868 views

 Share

The VK_KHR_dynamic_rendering extension has made its way into Vulkan 1.2.203 and I have implemented this in Ultra Engine. What does it do?

Instead of creating renderpass objects ahead of time, dynamic rendering allows you to just specify the settings you need as your are performing filling in command buffers with rendering instructions. From the Khronos working group:

Quote

When we were designing Vulkan 1.0, we had an idea to embed a task-graph-like object into Vulkan in the form of the render pass object. We knew the first version would be kind of restricted because we had an API to ship, and not long to do the work - but we had plans to extend the initial version, and those extensions would eventually provide significant flexibility to the API. Eventually, render passes would support all kinds of bells and whistles, including larger regions on input attachments, resolve shaders, and compute shaders! The idea was that these features would provide enough motivation to move all rendering to render pass objects and make the small amount of pain setting them up always worth it.

Fast forward to 2021, and the situation is not quite what we'd envisioned. On tiling GPUs, subpasses provide optimisation opportunities that can translate to impressive performance and efficiency wins. However, for many developers, subpasses either remain too restrictive to use or simply don't provide any practical benefit. For developers not using subpasses, render pass objects largely just get in the way.

In my experience, post-processing effects is where this hurt the most. The engine has a user-defined stack of post-processing effects, so there are many configurations possible. You had to store and cache a lot of renderpass objects for all possible combinations of settings. It's not impossible but it made things very very complicated. Basically, you have to know every little detail of how the renderpass object is going to be used in advance. I had several different functions like the code below, for initialing renderpasses that were meant to be used at various points in the rendering routine.

	bool RenderPass::InitializePostProcess(shared_ptr<GPUDevice> device, const VkFormat depthformat, const int colorComponents, const bool lastpass)
	{
		this->clearmode = clearmode;
		VkFormat colorformat = __FramebufferColorFormat;

		this->colorcomponents = colorComponents;
		if (depthformat != 0) this->depthcomponent = true;
		
		this->device = device;
      
		std::array< VkSubpassDependency, 2> dependencies;

		dependencies[0] = {};
		dependencies[0].srcSubpass = VK_SUBPASS_EXTERNAL;
		dependencies[0].dstSubpass = 0;
		dependencies[0].srcStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT;
		dependencies[0].srcAccessMask = 0;
		dependencies[0].dstStageMask = VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT;
		dependencies[0].dstAccessMask = VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_READ_BIT | VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT;

		dependencies[1] = {};
		dependencies[1].srcSubpass = VK_SUBPASS_EXTERNAL;
		dependencies[1].dstSubpass = 0;
		dependencies[1].srcStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT;
		dependencies[1].srcAccessMask = 0;
		dependencies[1].dstStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT;
		dependencies[1].dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_READ_BIT | VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;

		renderPassInfo = {};
		renderPassInfo.sType = VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO;
		renderPassInfo.attachmentCount = colorComponents;
		renderPassInfo.dependencyCount = colorComponents;
		if (depthformat == VK_FORMAT_UNDEFINED)
		{
			dependencies[0] = dependencies[1];
		}
		else
		{
			renderPassInfo.attachmentCount++;
			renderPassInfo.dependencyCount++;
		}
		renderPassInfo.pDependencies = dependencies.data();
      
		colorAttachment[0] = {};
		colorAttachment[0].format = colorformat;
		colorAttachment[0].samples = VK_SAMPLE_COUNT_1_BIT;
		colorAttachment[0].initialLayout = VK_IMAGE_LAYOUT_UNDEFINED;
		colorAttachment[0].loadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE;
		colorAttachment[0].storeOp = VK_ATTACHMENT_STORE_OP_STORE;
		colorAttachment[0].stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE;
		colorAttachment[0].stencilStoreOp = VK_ATTACHMENT_STORE_OP_DONT_CARE;
		colorAttachment[0].finalLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
		if (lastpass) colorAttachment[0].finalLayout = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR;
		
		VkAttachmentReference colorAttachmentRef = {};
		colorAttachmentRef.attachment = 0;
		colorAttachmentRef.layout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;

		depthAttachment = {};
		VkAttachmentReference depthAttachmentRef = {};
		if (depthformat != VK_FORMAT_UNDEFINED)
		{
			colorAttachmentRef.attachment = 1;
			depthAttachment.format = depthformat;
			depthAttachment.samples = VK_SAMPLE_COUNT_1_BIT;
			depthAttachment.loadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE;
			depthAttachment.initialLayout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL;// VK_IMAGE_LAYOUT_UNDEFINED;
			depthAttachment.storeOp = VK_ATTACHMENT_STORE_OP_STORE;
			depthAttachment.stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE;
			depthAttachment.stencilStoreOp = VK_ATTACHMENT_STORE_OP_DONT_CARE;
			depthAttachment.finalLayout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL;
			depthAttachmentRef.attachment = 0;
			depthAttachmentRef.layout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL;
		}

		colorAttachment[0].initialLayout = VK_IMAGE_LAYOUT_UNDEFINED;
		depthAttachment.initialLayout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL;// VK_IMAGE_LAYOUT_UNDEFINED;

		subpasses.push_back( {} );
		subpasses[0].pipelineBindPoint = VK_PIPELINE_BIND_POINT_GRAPHICS;
		subpasses[0].colorAttachmentCount = colorComponents;
		subpasses[0].pColorAttachments = &colorAttachmentRef;
		subpasses[0].pDepthStencilAttachment = NULL;
		if (depthformat != VK_FORMAT_UNDEFINED) subpasses[0].pDepthStencilAttachment = &depthAttachmentRef;

		VkAttachmentDescription attachments[2] = { colorAttachment[0], depthAttachment };

		renderPassInfo.subpassCount = subpasses.size();
		renderPassInfo.pAttachments = attachments;
		renderPassInfo.pSubpasses = subpasses.data();
		VkAssert(vkCreateRenderPass(device->device, &renderPassInfo, nullptr, &pass));

		return true;
	}

This gives you an idea of just how many render passes I had to create in advance:

		// Initialize Render Passes
		shadowpass[0] = make_shared<RenderPass>();
		shadowpass[0]->Initialize(dynamic_pointer_cast<GPUDevice>(Self()), { VK_FORMAT_UNDEFINED }, depthformat, 0, true);//, CLEAR_DEPTH, -1);
		shadowpass[1] = make_shared<RenderPass>();
		shadowpass[1]->Initialize(dynamic_pointer_cast<GPUDevice>(Self()), { VK_FORMAT_UNDEFINED }, depthformat, 0, true, true, true, 0);
		if (MULTIPASS_CUBEMAP)
		{
			cubeshadowpass[0] = make_shared<RenderPass>();
			cubeshadowpass[0]->Initialize(dynamic_pointer_cast<GPUDevice>(Self()), { VK_FORMAT_UNDEFINED }, depthformat, 0, true, true, true, CLEAR_DEPTH, 6);
			cubeshadowpass[1] = make_shared<RenderPass>();
			cubeshadowpass[1]->Initialize(dynamic_pointer_cast<GPUDevice>(Self()), { VK_FORMAT_UNDEFINED }, depthformat, 0, true, true, true, 0, 6);
		}
		//shaderStages[0] = TEMPSHADER->shaderStages[0];
		//shaderStages[4] = TEMPSHADER->shaderStages[4];

		posteffectspass = make_shared<RenderPass>();
		posteffectspass->InitializePostProcess(dynamic_pointer_cast<GPUDevice>(Self()), VK_FORMAT_UNDEFINED, 1, false);
		raytracingpass = make_shared<RenderPass>();
		raytracingpass->InitializeRaytrace(dynamic_pointer_cast<GPUDevice>(Self()));
		lastposteffectspass = make_shared<RenderPass>();
		lastposteffectspass->InitializeLastPostProcess(dynamic_pointer_cast<GPUDevice>(Self()), depthformat, 1, false);
		lastcameralastposteffectspass = make_shared<RenderPass>();
		lastcameralastposteffectspass->InitializeLastPostProcess(dynamic_pointer_cast<GPUDevice>(Self()), depthformat, 1, true);

		{
			std::vector<VkFormat> colorformats = { __FramebufferColorFormat ,__FramebufferColorFormat, VK_FORMAT_R8G8B8A8_SNORM, VK_FORMAT_R32_SFLOAT };
			for (int earlyZPass = 0; earlyZPass < 2; ++earlyZPass)
			{
				for (int clearflags = 0; clearflags < 4; ++clearflags)
				{
					renderpass[clearflags][earlyZPass] = make_shared<RenderPass>();
					renderpass[clearflags][earlyZPass]->Initialize(dynamic_pointer_cast<GPUDevice>(Self()), { VK_FORMAT_UNDEFINED }, depthformat, 1, false, false, false, clearflags, 1, earlyZPass);
					renderpassRGBA16[clearflags][earlyZPass] = make_shared<RenderPass>();
					renderpassRGBA16[clearflags][earlyZPass]->Initialize(dynamic_pointer_cast<GPUDevice>(Self()), colorformats, depthformat, 4, false, false, false, clearflags, 1, earlyZPass);
					firstrenderpass[clearflags][earlyZPass] = make_shared<RenderPass>();
					firstrenderpass[clearflags][earlyZPass]->Initialize(dynamic_pointer_cast<GPUDevice>(Self()), { VK_FORMAT_UNDEFINED }, depthformat, 1, false, true, false, clearflags, 1, earlyZPass);
					lastrenderpass[clearflags][earlyZPass] = make_shared<RenderPass>();
					lastrenderpass[clearflags][earlyZPass]->Initialize(dynamic_pointer_cast<GPUDevice>(Self()), { VK_FORMAT_UNDEFINED }, depthformat, 1, false, false, true, clearflags, 1, earlyZPass);
					//for (int d = 0; d < 2; ++d)
					{
						for (int n = 0; n < 5; ++n)
						{
							if (n == 2 or n == 3) continue;
							rendertotexturepass[clearflags][n][earlyZPass] = make_shared<RenderPass>();
							rendertotexturepass[clearflags][n][earlyZPass]->Initialize(dynamic_pointer_cast<GPUDevice>(Self()), colorformats, depthformat, n, true, false, false, clearflags, 1, earlyZPass);
							firstrendertotexturepass[clearflags][n][earlyZPass] = make_shared<RenderPass>();
							firstrendertotexturepass[clearflags][n][earlyZPass]->Initialize(dynamic_pointer_cast<GPUDevice>(Self()), colorformats, depthformat, n, true, true, false, clearflags, 1, earlyZPass);
							//	lastrendertotexturepass[clearflags][n] = make_shared<RenderPass>();
							//	lastrendertotexturepass[clearflags][n]->Initialize(dynamic_pointer_cast<GPUDevice>(Self()), depthformat, n, true, false, true, clearflags);
						}
					}
				}
			}
		}

With dynamic rendering, you still have to fill in most of the same information, but you can just do it based on whatever the current state of things is, instead of looking for an object that hopefully matches the exact settings you want:

			VkRenderingInfoKHR renderinfo = {};
			renderinfo.sType = VK_STRUCTURE_TYPE_RENDERING_INFO_KHR;
			renderinfo.renderArea = scissor;
			renderinfo.layerCount = 1;
			renderinfo.viewMask = 0;
			renderinfo.colorAttachmentCount = 1;

			targetbuffer->colorAttachmentInfo[0].imageLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
			targetbuffer->colorAttachmentInfo[0].clearValue.color.float32[0] = 0.0f;
			targetbuffer->colorAttachmentInfo[0].clearValue.color.float32[1] = 0.0f;
			targetbuffer->colorAttachmentInfo[0].clearValue.color.float32[2] = 0.0f;
			targetbuffer->colorAttachmentInfo[0].clearValue.color.float32[3] = 0.0f;
			targetbuffer->colorAttachmentInfo[0].imageView = targetbuffer->imageviews[0];
			renderinfo.pColorAttachments = targetbuffer->colorAttachmentInfo.data();

			targetbuffer->depthAttachmentInfo.clearValue.depthStencil.depth = 1.0f;
			targetbuffer->depthAttachmentInfo.clearValue.depthStencil.stencil = 0;
			targetbuffer->depthAttachmentInfo.imageLayout = VK_IMAGE_LAYOUT_DEPTH_ATTACHMENT_OPTIMAL;
			renderinfo.pDepthAttachment = &targetbuffer->depthAttachmentInfo;

			device->vkCmdBeginRenderingKHR(cb->commandbuffer, &renderinfo);

Then there is the way render passes effect the image layout state. With the TransitionImageLayout command, it is fairly easy to track the current state of the image layout, but render passes automatically switch the image layout after completion to a predefined state. Again, not impossible to handle, in and of itself, but when you add these things into the complexity of designing a full engine, things start to get ugly.

void GPUCommandBuffer::EndRenderPass()
{
	vkCmdEndRenderPass(commandbuffer);
	for (int k = 0; k < currentrenderpass->layers; ++k)
	{
		for (int n = 0; n < currentrenderpass->colorcomponents; ++n)
		{
			if (currentdrawbuffer->colortexture[n]) currentdrawbuffer->colortexture[n]->imagelayout[0][currentdrawbuffer->baseface + k] = currentrenderpass->colorAttachment[n].finalLayout;
		}
		if (currentdrawbuffer->depthtexture != NULL and currentrenderpass->depthcomponent == true) currentdrawbuffer->depthtexture->imagelayout[0][currentdrawbuffer->baseface + k] = currentrenderpass->depthAttachment.finalLayout;
	}
	currentdrawbuffer = NULL;
	currentrenderpass = NULL;
}

Another example where this was causing problems was with user-defined texture buffers. One beta tester wanted to implement some interesting effects that required rendering to some HDR color textures, but the system was so static it couldn't handle a user-defined color format in a texture buffer. Again, this is not impossible to overcome, but the practical outcome is I just didn't have enough time because resources are finite.

It's interesting that this extension also removes the need to create a Vulkan framebuffer object. I guess that means you can just start rendering to any combination of textures you want, so long as they use a format that is renderable by the hardware. Vulkan certainly changes a lot of conceptions we had in OpenGL.

So this extension does eliminate a significant source of problems for me, and I am happy it was implemented.

  • Like 6
 Share

8 Comments


Recommended Comments

Hey! I've found an old article on high resolution voxel rendering from way back in 2020. And in more recent previews the resolution, lets put it this way, not realy high. So I was wondering have you forsaken the idea of making a high-rez engine and if so why did it happen? Is the technology not quite there yet?

Link to comment

@WhoLebVoxels are only being used for lighting. Any voxellization you see in screenshots is just to visualize the data so I can see and show what is going on.

Link to comment

That's what the final effect looks like. Something like that, probably a bit better.

My initial implementation used a volume texture and lately I've been moving it into a sparse voxel octree. This uses much less memory and should provide faster performance. It's hard to implement, so my screenshots probably look like they are taking a step backwards, but the final result will be better than the earlier work.

Link to comment

I see, well then i'm glad that i found this blog

1 minute ago, Josh said:

That's what the final effect looks like. Something like that, probably a bit better.

My initial implementation used a volume texture and lately I've been moving it into a sparse voxel octree. This uses much less memory and should provide faster performance. It's hard to implement, so my screenshots probably look like they are taking a step backwards, but the final result will be better than the earlier work.

 

Link to comment

Please tell me, Josh, do you keep the Lua scripting in Ultra Engine? We're developing SALVATIONLAND game on Leadwerks 4.6 with 99% using Lua and would like to update the technical base in the future to Ultra Engine, so is it can be possible?

Link to comment
Guest
Add a comment...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...