Jump to content

Clustered Forward Rendering - First Performance Metrics


Josh

1,828 views

 Share

I was able to partially implement clustered forward rendering. At this time, I have not divided the camera frustum up into cells and I am just handing a single point light to the fragment shader, but instead of a naive implementation that would just upload the values in a shader uniform, I am going through the route of sending light IDs in a buffer. I first tried texture buffers because they have a large maximum size and I already have a GPUMemBlock class that makes them easy to work with. Because the GPU likes things to be aligned to 16 bytes, I am treating the buffer as an array of ivec4s, which makes the code a little trickier, thus we have a loop within a loop with some conditional breaks:

vec4 CalculateLighting(in vec3 position, in vec3 normal)
{
	vec4 lighting = vec4(0.0f);
	int n,i,lightindex,countlights[3];
	vec4 lightcolor;
	ivec4 lightindices;
	mat4 lightmatrix;
	vec2 lightrange;
	vec3 lightdir;
	float l,falloff;

	//Get light list offset
	int lightlistpos = 0;//texelFetch(texture12, sampleCoord, 0).x;

	//Point Lights
	countlights[0] = texelFetch(texture11, lightlistpos).x;
	for (n = 0; n <= countlights[0] / 4; ++n)
	{
		lightindices = texelFetch(texture11, lightlistpos + n);
		for (i = 0; i < 4; ++i)
		{
			if (n == 0 && i == 0) continue; //skip first slot since that contains the light count
			if (n * 4 + i > countlights[0]) break; //break if we go out of bounds of the light list
			lightindex = lightindices[1];
			
			lightmatrix[3] = texelFetch(texture15, lightindex * 4 + 3);

			vec3 lightdir = position - lightmatrix[3].xyz;
			float l = length(lightdir);
			falloff = max(0.0f,-dot(normal,lightdir/l));
			if (falloff <= 0.0f) continue;

			lightrange = texelFetch(texture15, lightindex * 4 + 4).xy;
			falloff *= max(0.0f, 1.0f - l / lightrange.y);
			if (falloff <= 0.0f) continue;
			
			lightmatrix[0] = texelFetch(texture15, lightindex * 4);
			lightmatrix[1] = texelFetch(texture15, lightindex * 4 + 1);
			lightmatrix[2] = texelFetch(texture15, lightindex * 4 + 2);
			lightcolor = vec4(lightmatrix[0].w,lightmatrix[1].w,lightmatrix[2].w,1.0f);
			
			lighting += lightcolor * falloff;
		}
	}
	return lighting;
}

I am testing with Intel graphics in order to get a better idea of where the bottlenecks are. My GEForce 1080 just chews through this without blinking an eye, so the slower hardware is actually helpful in tuning performance. I was dismayed at first when I saw my framerate drop from 700 to 200+. I created a simple scene in Leadwerks 4 with one point light and no shadows, and the performance was quite a bit worse on this hardware, so it looks like I am actually doing well. Here are the numbers:

  • Turbo (uniform buffer): 220 FPS
  • Turbo (texture buffer): 290 FPS
  • Leadwerks 4: 90 FPS

Of course a discrete card will run much better. The depth pre-pass has a very slight beneficial effect in this scene, and as more lights and geometry are added, I expect the performance savings will become much greater.

Post-processing effects like bloom require a texture with the scene rendered to it, so this system will still need to render to a single color texture when these effects are in use. The low quality settings, however, will render straight to the back buffer and thus provide a much better fallback for low-end hardware.

shadows.jpg.1761e4ca69bdbbf49c5ad5a1182f1ebb.thumb.jpg.2439aee91a750465f215f801d93423be.jpg

Here we can see the same shader working with lots of lights. To get good performance out of this, the camera frustum needs to be divided up into cells with a list of relevant lights for each cell.

Image1.thumb.jpg.1207be3451171bba59f19269df233b19.jpg

There are two more benefits to this approach. Context multisample antialiasing can be used when rendering straight to the back buffer. Of course, we can do the same with deferred rendering and multisample textures now, so that is not that big of a deal.

Image1.png

What IS a big deal is the fact that transparency with shadows will work 100%, no problems. All the weird tricks and hacks we have tried to use to achieve this all go away. (The below image is one such hack that uses dithering combined with MSAA to provide 50% transparency...sort of.)

b2e06c1f25a428aa0c87b56cf7d2a717.jpg

Everything else aside, our first tests reveal more than a 3X increase in performance over the lighting approach that Leadwerks 4 uses. Things look fantastic!

  • Like 6
 Share

0 Comments


Recommended Comments

There are no comments to display.

Guest
Add a comment...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...