Josh Posted August 7 Share Posted August 7 This thread is for performance testing of the version 0.9.7 beta. I am running the previous version 0.9.6 alongside my builds from source code. Starting with the instanced geometry benchmark: https://github.com/UltraEngine/Benchmarks/blob/main/Project Files/Ultra Engine/Source/Instanced Geometry.cpp I got rid of the ortho camera and sprite, and I am using Fraps to measure framerate. 0.9.6: 3100 0.9.7: 2560 This could be due to the instance buffer, which is now using 4 bytes instead of 2, or it could be due to some small shader change. Testing on a GEForce 1080 with Nvidia driver 560.70. @Dreikblack 1 Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
Josh Posted August 7 Author Share Posted August 7 Changing the draw ID size from 4 to 2 had no effect on the framerate. I suspect some small change in the vertex shader accounts for this... Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
bALLZNPLAY Posted August 7 Share Posted August 7 Could whatever change this is be related to my post yesterday about degrading performance and lower fps as the version get higher ? Quote Link to comment Share on other sites More sharing options...
Josh Posted August 7 Author Share Posted August 7 Where is your post? Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
Josh Posted August 7 Author Share Posted August 7 Actually, the 0.9.6 build is inconsistent. Sometimes it will start off at a higher speed and stay there, and sometimes it will be slightly slower than 0.9.7. It seems like this has more to do with the driver and OS than anything. Restarting now, maybe that will give better results after... Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
Josh Posted August 7 Author Share Posted August 7 0.9.6 is still inconsistent. Sometimes I hear the fan spin up really high at start and the framerate is 3100, other times it runs at the same speed as 0.9.7. Version 0.9.7 is still twice as fast as the Vulkan build from the beginning of the year. Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
Josh Posted August 7 Author Share Posted August 7 Testing this sample now: https://discord.com/channels/1175951843118031049/1175951843612954786/1270584299317362765 0.9.6: 620 0.9.7: 575 Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
Josh Posted August 7 Author Share Posted August 7 One big change between 0.9.6 and 0.9.7 is that 0.9.6 uses a single array texture to store all shadow maps in, and 0.9.7 uses separate shadow maps with bindless textures. This was done to provide control over the size of each light's shadow map. If I disable the directional light then both scenes run at the exact same speed in both versions of the engine. for (auto entity : scene->entities) { if (entity->As<Light>()) entity->SetHidden(true); } I do not know if there is any substantial difference in performance between these approaches. I don't think there should be. The difference in performance I am seeing is about 6.5%. Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
Josh Posted August 7 Author Share Posted August 7 If I disable camera lighting, 0.9.7 is slightly faster: for (auto entity : scene->entities) { if (entity->As<Camera>()) entity->As<Camera>()->SetLighting(false); } If I disable light shadows, 0.9.7 is a little slower than 0.9.6. So that indicates to me some data access in the lighting shader code is a little bit less optimal in 0.9.7 perhaps, but it has nothing to do with shadow maps / bindless textures. for (auto entity : scene->entities) { if (entity->As<Light>()) entity->SetShadows(false); } Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
Josh Posted August 7 Author Share Posted August 7 It appears that the LIGHT_DECAL case block of code in Shaders/PBR/Lighting.glsl does make the shader slower, even if no decals are in use. 687 vs 712, or 3.5% difference. Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
Josh Posted August 7 Author Share Posted August 7 In 0.9.7, entity color is stored in a separate 4x4 matrix, and only gets read if a flag indicates the color is not 1,1,1,1. Most of the time this does not happen, but lights do often have a different color. Reading an extra 4x4 matrix in a storage buffer in a fragment shader does make a difference. I think I am going to use some more aggressive compression and try to fit the color into the main 4x4 matrix to prevent the extra read. I don't think this is a complete explanation, but it will have a noticeable effect. Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
bALLZNPLAY Posted August 7 Share Posted August 7 Sounds great. Also I have notice I'm rendering tons of geometry not on screen or hidden behind other geometry. Culling should make a ton of difference also as far as complex scenes go. Quote Link to comment Share on other sites More sharing options...
Josh Posted August 8 Author Share Posted August 8 There is a way to do occlusion culling using the depth buffer that I hope to add in the future. Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
Josh Posted August 8 Author Share Posted August 8 Testing dreikblack's prefab boxes scene on an AMD 6600, with the latest updates: 0.9.6: 1120 0.9.7: 1075 That's only a 4.2% difference. That difference gets a bit smaller if I comment out the shadow code in both sets of shaders, so it appears that sampling from different bindless textures can have a very small impact on performance (about 1%) vs. the more restricted texture array approach I used previously. There's probably some very small overhead when converting the bindless texture handle into a sampler object, but at the same time the increased flexibility allows techniques that result in very large performance gains over conventional methods. Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
Dreikblack Posted August 9 Share Posted August 9 In game performance in release seems to better as well but still lower than used to be at in biggest map Quote Link to comment Share on other sites More sharing options...
Josh Posted August 18 Author Share Posted August 18 @Dreikblack I am running your game now and there is definitely something wrong. The framerate is very inconsistent and often very low, but I don't know why yet. Is there any built-in way to display the rendering statistics? Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
Josh Posted August 18 Author Share Posted August 18 Okay, I found that GetMemoryUsage() in debug builds is extremely slow, taking almost 200 milliseconds. I found this just by hitting pause on the debugger and seeing which function is stopped on, since if something takes a long time it is statistically more likely to be hit. That's your primary problem. 1 Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
Dreikblack Posted August 19 Share Posted August 19 4 hours ago, Josh said: I can't seem to get back into the game. I select "Load map" > "Easy" and then am shown a loading screen. From there, nothing happens. Loading seems taking too long with this issue, like few mins and more with any big maps. 4 hours ago, Josh said: Okay, I found that GetMemoryUsage() in debug builds is extremely slow, taking almost 200 milliseconds. Yes, that's it! Can't believe i did not try to remove it before knowing that update with GetMemoryUsage() changes caused this issue After removing it fps became normal for debug. Quote Link to comment Share on other sites More sharing options...
Josh Posted August 22 Author Share Posted August 22 @Dreikblack Now that we have solved the primary issue, I am interested in hearing your assessment of the situation now. Is there something in particular I should investigate? Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
Dreikblack Posted August 22 Share Posted August 22 In compare with stable version? Hard to say, seems close. In general? Way lower than it was after switch to OpenGL and it was already worse than Vulkan in most case for Nvidia cards. Except Instanced Geometry, this benchmark have similar result now and it was better for OpenGL as well. On PC with RTX 2070 in my game fps used to be 250-300 few months ago in 1080p and now it's 150 (on simple map). Maybe because map was smaller. Even in menu it's barely 170 and it's just few brushes, 1 model with Mover for rotation and point light on this model. On proper demo map barely 100 fps. Found 22th Jule build - 190-200 fps, 170 fps on latest build on same map. And i used to play Quake Champions (which is often being criticized for poor performance by players) on this PC on Ultra settings and was getting 141 fps with a fps lock in 1080p. On PC with 4080 in compare with games with proper performance my game is barely runs better meanwhile graphic is 10x worse since i use low res assets from Quake rerelase and i did not start using particles and decals yet. Can't do proper compare to Quake Champions since it's have max fps lock 300 fps atm. GUI was always weak spot for performance in Ultra. For example FPS drop from 6800 to 2000 after gui init: #include "UltraEngine.h" using namespace UltraEngine; shared_ptr<Window> window; shared_ptr<Framebuffer> framebuffer; shared_ptr<World> menuWold; shared_ptr<Interface> ui; shared_ptr<Camera> uiCamera; shared_ptr<Widget> panel; shared_ptr<Icon> icon1; void initGui() { auto default_font = LoadFont("Fonts\\arial.ttf"); ui = CreateInterface(menuWold, default_font, framebuffer->GetSize()); ui->SetRenderLayers(2); ui->root->SetColor(0.0f, 0.0f, 0.0f, 0.0f); uiCamera = CreateCamera(menuWold, PROJECTION_ORTHOGRAPHIC); uiCamera->SetPosition((float)framebuffer->GetSize().x * 0.5f, (float)framebuffer->GetSize().y * 0.5f, 0); uiCamera->SetRenderLayers(2); uiCamera->SetClearMode(CLEAR_DEPTH); icon1 = LoadIcon("https://raw.githubusercontent.com/UltraEngine/Documentation/master/Assets/Icons/help.svg"); for (int i = 1; i < 12; i++) { for (int j = 0; j < 20; j++) { auto panelI = CreatePanel(64 * j + 70, 64 * i + 70, 64, 64, ui->root, PANEL_DEFAULT); panelI->SetIcon(icon1); } } } int main(int argc, const char* argv[]) { auto displays = GetDisplays(); window = CreateWindow("Ultra Engine", 0, 0, 1600, 900, displays[0], WINDOW_DEFAULT); menuWold = CreateWorld(); menuWold->RecordStats(); framebuffer = CreateFramebuffer(window); auto light = CreateBoxLight(menuWold); light->SetRange(-10, 10); light->SetRotation(15, 15, 0); light->SetColor(2); auto camera = CreateCamera(menuWold); camera->SetClearColor(0.125); camera->SetPosition(0, 0, -3); camera->SetFov(70); //Main loop while (window->Closed() == false and window->KeyDown(KEY_ESCAPE) == false) { if (window->KeyHit(KEY_SPACE)) { if (!ui) initGui(); } window->SetText("FPS: " + String(menuWold->renderstats.framerate)); menuWold->Update(); menuWold->Render(framebuffer, false); } return 0; } Quote Link to comment Share on other sites More sharing options...
Josh Posted August 22 Author Share Posted August 22 14 hours ago, Dreikblack said: GUI was always weak spot for performance in Ultra. For example FPS drop from 6800 to 2000 after gui init: I don't know if differences like that are very meaningful. When you measure FPS, the difference between lower numbers is much more significant than the difference between high numbers. If you convert that to time elapsed it becomes more clear. The first number means 0.14 milliseconds per frame and the second means 0.5 milliseconds, so we are talking a different of 0.36 milliseconds. Besides the benchmarks I made, is there any specific code I should focus on testing? Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
Josh Posted August 22 Author Share Posted August 22 The GUI example above is actually pretty impressive that it runs so well. It's rendering 440 unique objects, and half of them are alpha-blended. Widget blocks do not use any type of instancing, although the draw calls are batched. Each block is a unique mesh. I think that situation could probably be optimized more if I disabled z-sorting on the transparent blocks. Currently transparency and z-sorting are always tied together, but that is one situation where you would not want the extra overhead of z-sorting and the separate draw calls that entails. Let's see with a GEForce 1080: Nothing: 4200 With z-sorting: 1177 Z-sort disabled: 3800 I think this will be okay, because with GUIs you don't have transparency on top of transparency, normally. If you have something like an icon on a semi transparent panel, that will cause a problem if z-sorting is not active. Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
Josh Posted August 23 Author Share Posted August 23 I added the update that makes your GUI example much faster. 2 Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
Dreikblack Posted August 24 Share Posted August 24 On 8/23/2024 at 2:41 AM, Josh said: If you have something like an icon on a semi transparent panel, that will cause a problem if z-sorting is not active. Well, thats what i have FPS in my case seems to be same as before this update so bottleneck is not z-sorting i suppose Quote Link to comment Share on other sites More sharing options...
Josh Posted August 24 Author Share Posted August 24 21 minutes ago, Dreikblack said: Well, thats what i have FPS in my case seems to be same as before this update so bottleneck is not z-sorting i suppose Are some of the icons not appearing on top of those transparent backgrounds? I can probably make it a little smarter. What else should I be testing? Just the game in general, or something more specific? Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.