Jump to content

Performance Testing


Josh
 Share

Recommended Posts

This thread is for performance testing of the version 0.9.7 beta. I am running the previous version 0.9.6 alongside my builds from source code.

Starting with the instanced geometry benchmark: https://github.com/UltraEngine/Benchmarks/blob/main/Project Files/Ultra Engine/Source/Instanced Geometry.cpp

I got rid of the ortho camera and sprite, and I am using Fraps to measure framerate.

0.9.6: 3100
0.9.7: 2560

This could be due to the instance buffer, which is now using 4 bytes instead of 2, or it could be due to some small shader change.

Testing on a GEForce 1080 with Nvidia driver 560.70.

@Dreikblack

  • Like 1

My job is to make tools you love, with the features you want, and performance you can't live without.

Link to comment
Share on other sites

Changing the draw ID size from 4 to 2 had no effect on the framerate. I suspect some small change in the vertex shader accounts for this...

My job is to make tools you love, with the features you want, and performance you can't live without.

Link to comment
Share on other sites

Actually, the 0.9.6 build is inconsistent. Sometimes it will start off at a higher speed and stay there, and sometimes it will be slightly slower than 0.9.7. It seems like this has more to do with the driver and OS than anything. Restarting now, maybe that will give better results after...

My job is to make tools you love, with the features you want, and performance you can't live without.

Link to comment
Share on other sites

0.9.6 is still inconsistent. Sometimes I hear the fan spin up really high at start and the framerate is 3100, other times it runs at the same speed as 0.9.7. Version 0.9.7 is still twice as fast as the Vulkan build from the beginning of the year.

My job is to make tools you love, with the features you want, and performance you can't live without.

Link to comment
Share on other sites

One big change between 0.9.6 and 0.9.7 is that 0.9.6 uses a single array texture to store all shadow maps in, and 0.9.7 uses separate shadow maps with bindless textures. This was done to provide control over the size of each light's shadow map.

If I disable the directional light then both scenes run at the exact same speed in both versions of the engine.

for (auto entity : scene->entities)
{
	if (entity->As<Light>()) entity->SetHidden(true);
}

I do not know if there is any substantial difference in performance between these approaches. I don't think there should be.

The difference in performance I am seeing is about 6.5%.

My job is to make tools you love, with the features you want, and performance you can't live without.

Link to comment
Share on other sites

If I disable camera lighting, 0.9.7 is slightly faster:

	for (auto entity : scene->entities)
	{
		if (entity->As<Camera>()) entity->As<Camera>()->SetLighting(false);
	}

If I disable light shadows, 0.9.7 is a little slower than 0.9.6. So that indicates to me some data access in the lighting shader code is a little bit less optimal in 0.9.7 perhaps, but it has nothing to do with shadow maps / bindless textures.

	for (auto entity : scene->entities)
	{
		if (entity->As<Light>()) entity->SetShadows(false);
	}

 

My job is to make tools you love, with the features you want, and performance you can't live without.

Link to comment
Share on other sites

It appears that the LIGHT_DECAL case block of code in Shaders/PBR/Lighting.glsl does make the shader slower, even if no decals are in use.

687 vs 712, or 3.5% difference.

My job is to make tools you love, with the features you want, and performance you can't live without.

Link to comment
Share on other sites

In 0.9.7, entity color is stored in a separate 4x4 matrix, and only gets read if a flag indicates the color is not 1,1,1,1. Most of the time this does not happen, but lights do often have a different color. Reading an extra 4x4 matrix in a storage buffer in a fragment shader does make a difference. I think I am going to use some more aggressive compression and try to fit the color into the main 4x4 matrix to prevent the extra read.

I don't think this is a complete explanation, but it will have a noticeable effect.

My job is to make tools you love, with the features you want, and performance you can't live without.

Link to comment
Share on other sites

There is a way to do occlusion culling using the depth buffer that I hope to add in the future.

My job is to make tools you love, with the features you want, and performance you can't live without.

Link to comment
Share on other sites

Testing dreikblack's prefab boxes scene on an AMD 6600, with the latest updates:
0.9.6: 1120
0.9.7: 1075

That's only a 4.2% difference. That difference gets a bit smaller if I comment out the shadow code in both sets of shaders, so it appears that sampling from different bindless textures can have a very small impact on performance (about 1%) vs. the more restricted texture array approach I used previously. There's probably some very small overhead when converting the bindless texture handle into a sampler object, but at the same time the increased flexibility allows techniques that result in very large performance gains over conventional methods.

My job is to make tools you love, with the features you want, and performance you can't live without.

Link to comment
Share on other sites

  • 2 weeks later...

@Dreikblack I am running your game now and there is definitely something wrong. The framerate is very inconsistent and often very low, but I don't know why yet. Is there any built-in way to display the rendering statistics?

image.thumb.png.01a3779598e8c12195c028d64790fcdb.png

My job is to make tools you love, with the features you want, and performance you can't live without.

Link to comment
Share on other sites

Okay, I found that GetMemoryUsage() in debug builds is extremely slow, taking almost 200 milliseconds. I found this just by hitting pause on the debugger and seeing which function is stopped on, since if something takes a long time it is statistically more likely to be hit. That's your primary problem.

  • Thanks 1

My job is to make tools you love, with the features you want, and performance you can't live without.

Link to comment
Share on other sites

4 hours ago, Josh said:

I can't seem to get back into the game. I select "Load map" > "Easy" and then am shown a loading screen. From there, nothing happens.

Loading seems taking too long with this issue, like few mins and more with any big maps.

 

4 hours ago, Josh said:

Okay, I found that GetMemoryUsage() in debug builds is extremely slow, taking almost 200 milliseconds.

Yes, that's it! Can't believe i did not try to remove it before knowing that update with GetMemoryUsage() changes caused this issue :unsure:

After removing it fps became normal for debug.

Link to comment
Share on other sites

@Dreikblack Now that we have solved the primary issue, I am interested in hearing your assessment of the situation now. Is there something in particular I should investigate?

My job is to make tools you love, with the features you want, and performance you can't live without.

Link to comment
Share on other sites

In compare with stable version? Hard to say, seems close.

In general? Way lower than it was after switch to OpenGL and it was already worse than Vulkan in most case for Nvidia cards. Except Instanced Geometry, this benchmark have similar result now and it was better for OpenGL as well.

On PC with RTX 2070 in my game fps used to be 250-300 few months ago in 1080p and now it's 150 (on simple map). Maybe because map was smaller. Even in menu it's barely 170 and it's just few brushes, 1 model with Mover for rotation and point light on this model. On proper demo map barely 100 fps.

Found 22th Jule build - 190-200 fps, 170 fps on latest build on same map. And i used to play Quake Champions (which is often being criticized for poor performance by players) on this PC on Ultra settings and was getting 141 fps with a fps lock in 1080p.

On PC with 4080 in compare with games with proper performance my game is barely runs better meanwhile graphic is 10x worse since i use low res assets from Quake rerelase and i did not start using particles and decals yet. Can't do proper compare to Quake Champions since it's have max fps lock 300 fps atm.

GUI was always weak spot for performance in Ultra. For example FPS drop from 6800 to 2000 after gui init:

#include "UltraEngine.h"

using namespace UltraEngine;

shared_ptr<Window> window;
shared_ptr<Framebuffer> framebuffer;
shared_ptr<World> menuWold;
shared_ptr<Interface> ui;
shared_ptr<Camera> uiCamera;

shared_ptr<Widget> panel;
shared_ptr<Icon> icon1;

void initGui() {
    auto default_font = LoadFont("Fonts\\arial.ttf");
    ui = CreateInterface(menuWold, default_font, framebuffer->GetSize());
    ui->SetRenderLayers(2);
    ui->root->SetColor(0.0f, 0.0f, 0.0f, 0.0f);
    uiCamera = CreateCamera(menuWold, PROJECTION_ORTHOGRAPHIC);
    uiCamera->SetPosition((float)framebuffer->GetSize().x * 0.5f, (float)framebuffer->GetSize().y * 0.5f, 0);
    uiCamera->SetRenderLayers(2);
    uiCamera->SetClearMode(CLEAR_DEPTH);

    icon1 = LoadIcon("https://raw.githubusercontent.com/UltraEngine/Documentation/master/Assets/Icons/help.svg");

    for (int i = 1; i < 12; i++) {
        for (int j = 0; j < 20; j++) {
            auto panelI = CreatePanel(64 * j + 70, 64 * i + 70, 64, 64, ui->root, PANEL_DEFAULT);
            panelI->SetIcon(icon1);
        }
    }
}

int main(int argc, const char* argv[]) {
    auto displays = GetDisplays();
    window = CreateWindow("Ultra Engine", 0, 0, 1600, 900, displays[0], WINDOW_DEFAULT);
    menuWold = CreateWorld();
    menuWold->RecordStats();
    framebuffer = CreateFramebuffer(window);
    auto light = CreateBoxLight(menuWold);
    light->SetRange(-10, 10);
    light->SetRotation(15, 15, 0);
    light->SetColor(2);
    auto camera = CreateCamera(menuWold);
    camera->SetClearColor(0.125);
    camera->SetPosition(0, 0, -3);
    camera->SetFov(70);
    //Main loop
    while (window->Closed() == false and window->KeyDown(KEY_ESCAPE) == false) {
        if (window->KeyHit(KEY_SPACE)) {
            if (!ui) initGui();
        }
        window->SetText("FPS: " + String(menuWold->renderstats.framerate));
        menuWold->Update();
        menuWold->Render(framebuffer, false);
    }
    return 0;
}

 

Link to comment
Share on other sites

14 hours ago, Dreikblack said:

GUI was always weak spot for performance in Ultra. For example FPS drop from 6800 to 2000 after gui init:

I don't know if differences like that are very meaningful. When you measure FPS, the difference between lower numbers is much more significant than the difference between high numbers. If you convert that to time elapsed it becomes more clear. The first number means 0.14 milliseconds per frame and the second means 0.5 milliseconds, so we are talking a different of 0.36 milliseconds.

Besides the benchmarks I made, is there any specific code I should focus on testing?

My job is to make tools you love, with the features you want, and performance you can't live without.

Link to comment
Share on other sites

The GUI example above is actually pretty impressive that it runs so well. It's rendering 440 unique objects, and half of them are alpha-blended. Widget blocks do not use any type of instancing, although the draw calls are batched. Each block is a unique mesh.

I think that situation could probably be optimized more if I disabled z-sorting on the transparent blocks. Currently transparency and z-sorting are always tied together, but that is one situation where you would not want the extra overhead of z-sorting and the separate draw calls that entails.

Let's see with a GEForce 1080:

Nothing: 4200
With z-sorting: 1177
Z-sort disabled: 3800

:blink:

I think this will be okay, because with GUIs you don't have transparency on top of transparency, normally. If you have something like an icon on a semi transparent panel, that will cause a problem if z-sorting is not active.

My job is to make tools you love, with the features you want, and performance you can't live without.

Link to comment
Share on other sites

On 8/23/2024 at 2:41 AM, Josh said:

If you have something like an icon on a semi transparent panel, that will cause a problem if z-sorting is not active.

Well, thats what i have :unsure:

image.thumb.png.4e824a1c3bf16e5d49925e7fd4b16392.png

FPS in my case seems to be same as before this update so bottleneck is not z-sorting i suppose

Link to comment
Share on other sites

21 minutes ago, Dreikblack said:

Well, thats what i have :unsure:

image.thumb.png.4e824a1c3bf16e5d49925e7fd4b16392.png

FPS in my case seems to be same as before this update so bottleneck is not z-sorting i suppose

Are some of the icons not appearing on top of those transparent backgrounds? I can probably make it a little smarter.

What else should I be testing? Just the game in general, or something more specific?

My job is to make tools you love, with the features you want, and performance you can't live without.

Link to comment
Share on other sites

  • Josh changed the title to Performance Testing

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...