SpiderPig Posted November 24, 2022 Share Posted November 24, 2022 Been working on voxel terrain and have reached a crossroad. Currently everything is done on the CPU. The terrain is about 1024x1024x1024 and made of a grid of components (with each one being a model) that are 128x128x128. Only the components that are currently intersecting the surface level are actually made. When the LOD of a component changes the mesh of that model is edited by indexing the vertex and indices arrays (to maximise speed). This is fine for lower LOD levels where each node may be 4m or larger, but getting down to 1m size nodes it can take up to 1 second for the whole thing to update. I always knew the GPU would have to be utilized somehow which leads me to my question, is the compute shader the way to go here? I'm thinking of having a 3D texture for each component that simply holds the triangulation index of each node. I can store that index (0 - 255) in one byte so I think a 128x128x128 texture would only be about 2MB, and a lot smaller if the node size is larger than 1m. I may end up sending some more information in the texture later on so if it's fast enough it may end up being a 3 or 4 channel texture. I have no idea if an 8MB texture can even be sent quickly to the GPU... it won't be every frame. Just for the component that needs updating when the LOD changes. At most there may be 4 or 5 components that require an update at once and this could even be staggered over several frames. I'm just not sure if the compute, vertex or geometry shaders are best suited to take a 3D texture and spit out some tringles. None of the information needs to come back to the CPU, its just to create the visuals. I can use the same indexing on the CPU to generate collisions meshes. I believe I can speed up the octree subdivisions by creating large blocks of contiguous memory for the octree nodes rather than creating and deleting each node as they are needed. I'm using a signed distance field on the CPU side to generate the basic terrain shape. Got it working pretty fast so for now I think I will leave the generation side of things as it is. A quick smoothing makes it look more like a floating island... I still have to finish the voxel triangulation table - there's still a few holes in the mesh. Quote Link to comment Share on other sites More sharing options...
SpiderPig Posted November 24, 2022 Author Share Posted November 24, 2022 Another option could be to use a single vertex per node and use the Geometry shader to build the voxel itself... Quote Link to comment Share on other sites More sharing options...
Solution Josh Posted November 25, 2022 Solution Share Posted November 25, 2022 You might be surprised how efficient C++ code can be when you start optimizing hard. I would try this route first. I had some terrain splatting code I thought had to be done in a compute shader. My first step was to refactor the code so it would operate more like a shader. Low and behold when I did that it ran so fast I no longer needed the shader. For high performance C++ you can basically assume that math operations are free but memory and stack allocations are costly. If you are adding components to an STL vector, use reserve() to preallocate a larger block of memory. std::vector<int> a; a.reserve(1000); for (int n = 0; n < 1000; ++n) { a.push_back(n) } If you don't know the final length of the vector you can do this: while (something) { if (a.capacity() == a.size()) a.reserve(a.size() * 1.3f); a.push_back(n) } Declare variables outside of a loop, never inside. float a; for (int x = 0; x < 1000; ++x) { a = whatever; } Avoid use of Vec2 / Vec3 / Vec4 math that results in new objects behind created. For example this: v *= 3.0f; is much better than this: v = v * 3.0f; Avoid square roots when possible. If you want to test distances, you can get the squared length of a vector like this and test it against the distance squared: float lengthsquared = v.Dot(v); If you are iterating through a container of objects, this is your best friend: for (auto& thing : list) { thing.x = 2; } 2 Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
Josh Posted November 25, 2022 Share Posted November 25, 2022 And once you do that, you also have multiple threads at your disposal: https://www.ultraengine.com/learn/Thread?lang=cpp I would keep a group of threads ready and use a semaphore to signal to them when they have new data to process, and then each thread should signal another semaphore indicating they are finished processing. Don't continuously create new threads. 1 Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
Josh Posted November 25, 2022 Share Posted November 25, 2022 Something else that might be relevant to you: Mesh size in Ultra is fixed. Once the mesh is submitted to the rendering thread you can't add vertices or primitives. This allows the renderer to offload additional overhead onto the culling thread. You can modify the vertices and indices of a mesh. So if you want to change a mesh into a new one, you can do this: If the new mesh is bigger than the existing mesh, there is nothing you can do except use the new one. If the new mesh is the same size as the existing mesh, you can just copy the vertices and indices data with a memcpy, and call one command to trigger a mesh re-submit, like SetVertexPosition(). If the new mesh is smaller than the existing mesh, you can do the above, and for all the unused vertices at the end you can just set their positions to zero. Over time the mesh size will grow so that the first scenario in this list stops happening. This is how I implemented Sprite::SetText(). 1 Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
SpiderPig Posted November 25, 2022 Author Share Posted November 25, 2022 Thanks I'll see what I can improve! 1 hour ago, Josh said: For high performance C++ you can basically assume that math operations are free but memory and stack allocations are costly. I think memory allocation is my biggest enemy at the moment so maybe I should fix that first and then see how fast it is. I'm also becoming aware that recursive functions are really useful but appear to get really slow if your using a lot of them, probably from lots of heap allocations - or stack allocations I can never remember which is which. I think if I could get rid of recursive functions it'd probably be 10x faster. 1 hour ago, Josh said: Mesh size in Ultra is fixed. Once the mesh is submitted to the rendering thread you can't add vertices or primitives. This allows the renderer to offload additional overhead onto the culling thread. Currently I am resizing the vertex and indice array of a mesh so maybe this is creating a bottle neck. Is the re-submission of mesh data very costly? Dependent on mesh size of course but is 1,000 vertices seen as a piece of cake or not a good idea? Quote Link to comment Share on other sites More sharing options...
Josh Posted November 25, 2022 Share Posted November 25, 2022 6 minutes ago, SpiderPig said: Currently I am resizing the vertex and indice array of a mesh so maybe this is creating a bottle neck. Is the re-submission of mesh data very costly? Dependent on mesh size of course but is 1,000 vertices seen as a piece of cake or not a good idea? I don't see how you can resize those members because the public members are read-only. Data transfer at that level should be no problem at all. I don't know the exact size of the vertex structure, but it's probably somewhere around 72 bytes, so 1000 vertices is only 70 Kb of data. 1 Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
SpiderPig Posted November 25, 2022 Author Share Posted November 25, 2022 2 minutes ago, Josh said: I don't see how you can resize those members because the public members are read-only. I may have "un-const" them Not a good idea I know, but I'm waiting for the public Set() commands. Sorry 5 minutes ago, Josh said: Data transfer at that level should be no problem at all. I don't know the exact size of the vertex structure, but it's probably somewhere around 72 bytes, so 1000 vertices is only 70 Kb fof data. I did a quick check - sizeof(Vertex) = 116 bytes. If I double up on vertices between voxels and every voxel had a vertex on every edge, so 12 vertices each, 128x128 voxels would be : 128x128x12x116 = 22,806,528 bytes. 22MB. Even though it's only once every so often that seems like a bit of an ask... but I'm not sure 1 Quote Link to comment Share on other sites More sharing options...
Josh Posted November 25, 2022 Share Posted November 25, 2022 1 hour ago, SpiderPig said: I did a quick check - sizeof(Vertex) = 116 bytes. If I double up on vertices between voxels and every voxel had a vertex on every edge, so 12 vertices each, 128x128 voxels would be : 128x128x12x116 = 22,806,528 bytes. 22MB. Even though it's only once every so often that seems like a bit of an ask... but I'm not sure I think that should still be fine. You might want to make your terrain subdivisions arbitrary so you can easily change the subdivision to test what works best. If there is a problem you will see it in the framerate. This is completely unrelated to your game loop speed, so make sure you know what you are measuring the timing of. 1 Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
SpiderPig Posted November 27, 2022 Author Share Posted November 27, 2022 I've found half the problem - I was clearing the indice array and then still using AddPrimitive() to create each face. In debug clearing the largest patch was taking 50ms and rebuilding it was taking 90ms. component->mesh->indices.clear(); ... component->mesh->AddPrimitive(current_vert_index, current_vert_index + 1, current_vert_index + 2); I won't be able to fix this until there's a way to to properly set the vertices and indices of a mesh all at once. I'm wondering if it would be faster to create an entire new mesh instance rather than editing an existing one. It seems though we can only add a mesh to a model rather than swapping it out completely? Is this for the same reason below? On 11/25/2022 at 8:31 PM, Josh said: This allows the renderer to offload additional overhead onto the culling thread. On 11/25/2022 at 8:31 PM, Josh said: So if you want to change a mesh into a new one, you can do this: If the new mesh is bigger than the existing mesh, there is nothing you can do except use the new one. If the new mesh is the same size as the existing mesh, you can just copy the vertices and indices data with a memcpy, and call one command to trigger a mesh re-submit, like SetVertexPosition(). If the new mesh is smaller than the existing mesh, you can do the above, and for all the unused vertices at the end you can just set their positions to zero. Over time the mesh size will grow so that the first scenario in this list stops happening. Resizing the arrays to be smaller or larger has actually worked - so should it not be done for performance or is it slowly breaking the renderer? Quote Link to comment Share on other sites More sharing options...
SpiderPig Posted November 27, 2022 Author Share Posted November 27, 2022 Might even be better again to create the new mesh instance and add it to a model as a different LOD mesh and let the shaders take care of LOD - is that how LOD works in Ultra? My only thoughts here are for a massive terrain over time the RAM and GPU memory could increase to large numbers (more so thinking GPU memory). I could create the mesh on a separate thread and then assign it the model. If memory does become a problem I could create an entirely new model for patches that haven't been visited for a while or are so far away and then delete the old model with all the extra meshes. Deleting a model will remove all the mesh information from the GPU and renderer? Quote Link to comment Share on other sites More sharing options...
Josh Posted November 28, 2022 Share Posted November 28, 2022 When you are using multithreading, it's best just to stick to your own classes and then create the mesh when the data comes back to the main thread. Each thread would create maybe a float vector stored in a shared pointer, return that, and then you would call CreateMesh() in the main thread when that data is retrieved from the helper threads. Yes, if a mesh is deleted (it goes out of scope) then that memory in VRAM is freed up. You could use Ultra's Lod system if you want. I will add an example showing this. 1 Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
Josh Posted November 28, 2022 Share Posted November 28, 2022 9 hours ago, SpiderPig said: In debug clearing the largest patch was taking 50ms and rebuilding it was taking 90ms. C++ STL can be very slow in debug mode for some reason. I actually wrote my own sort routine for one part of the engine because of this, instead using std::sort. (I don't suggest doing that outside of very very special high-performance cases.) I am adding some more arguments to CreateMesh: shared_ptr<Mesh> CreateMesh(const MeshPrimitives type = MESH_TRIANGLES, std::vector<Vertex> vertices = {}, std::vector<uint32_t> indices = {}) I am also adding a Mesh::Modify method. 1 Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
SpiderPig Posted November 28, 2022 Author Share Posted November 28, 2022 Wow, with the new CreateMesh() arguments the largest patch can now be created 3x faster! Thankyou. Quote Link to comment Share on other sites More sharing options...
SpiderPig Posted November 29, 2022 Author Share Posted November 29, 2022 If I create more threads than currently available, will the excess threads essentially wait for an available slot to execute in or do I have to manage this somehow? EDIT : It appears to Quote Link to comment Share on other sites More sharing options...
Josh Posted December 1, 2022 Share Posted December 1, 2022 They are managed automatically. MaxThreads() tells you the number of concurrent threads your machine supports. 1 Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
Josh Posted December 1, 2022 Share Posted December 1, 2022 Please don't call CreateMesh() of Mesh::Modify() outside the main thread. 😪 Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
SpiderPig Posted December 1, 2022 Author Share Posted December 1, 2022 I have been very good. They are called on the main thread. 1 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.