SpiderPig Posted January 18, 2023 Share Posted January 18, 2023 I'm creating an excessive amount of threads to test a few ideas of mine and in the process raised a few questions... Is Start() waiting for an available slot? Because "Done" is not printed until all threads have printed "Hello" which leads me to believe Start() is waiting or is just taking long time to start? Some other examples aren't doing this. I think it depends on what I'm asking the thread to do. It's probably exiting a thread as fast as it's being started. Is starting a thread relatively slow? Pressing the spacebar should clear the memory allocated by the threads? Can a thread be repurposed? I mean creating a lot of threads takes time and memory. I'm wondering if it's best to create only the amount that can run and then once one is done I can send it a new function or user data and restart it. #include "UltraEngine.h" #include "ComponentSystem.h" using namespace UltraEngine; void RunThread() { auto r = 0.0f; for (int i = 0; i < 1000; i++) { r += sqrt(Random(2.0f, 1024.0f)); } Print("Hello"); } int main(int argc, const char* argv[]) { auto displays = GetDisplays(); auto window = CreateWindow("Ultra Engine", 0, 0, 1280, 720, displays[0], WINDOW_CENTER | WINDOW_TITLEBAR); auto world = CreateWorld(); auto framebuffer = CreateFramebuffer(window); auto camera = CreateCamera(world); camera->SetClearColor(0.125); camera->SetFov(70); camera->SetPosition(0, 0, -3); auto light = CreateDirectionalLight(world); light->SetRotation(35, 45, 0); light->SetRange(-10, 10); auto box = CreateBox(world); box->SetColor(0,0,1); auto actor = CreateActor(box); auto component = actor->AddComponent<Mover>(); component->rotation.y = 45; //Quick to create vector<shared_ptr<Thread>> threads; threads.reserve(10000); for (int id = 0; id < 10000; id++) { threads.push_back(CreateThread(RunThread, false)); } //Not so quick to start... is it waiting? for (auto t : threads) { t->Start(); } Print("Done");//All threads seem to finish before this is called. while (window->Closed() == false and window->KeyDown(KEY_ESCAPE) == false) { if (window->KeyHit(KEY_SPACE)) { threads.clear(); }//<- memory usage in VS dosn't change world->Update(); world->Render(framebuffer); } return 0; } Quote Link to comment Share on other sites More sharing options...
Josh Posted January 18, 2023 Share Posted January 18, 2023 The thread class doesn't actually create a "real" thread until the thread is first launched. In this case, that would be when the Start() method is called. Thread creation is fast-ish. I mean for something like processing a pixmap across multiple threads it's fine to just create the threads and use them once...but if you are constantly creating threads it is better to have a set of threads waiting for work to do. A semaphore is really good for this, better than a mutex, if you can wrap your mind around how they work. 1 Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
SpiderPig Posted January 18, 2023 Author Share Posted January 18, 2023 I'll probably use a semaphore actually, I'm experimenting now. 1 hour ago, SpiderPig said: Pressing the spacebar should clear the memory allocated by the threads? The only other issue here - a memory leak perhaps? Might be my bad programming though. Quote Link to comment Share on other sites More sharing options...
Josh Posted January 18, 2023 Share Posted January 18, 2023 Are you sure the memory is the same? 10,000 shared pointers probably consume very little memory compared to the rest of the program. Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
SpiderPig Posted January 18, 2023 Author Share Posted January 18, 2023 The program says the memory used totals 1.6GB once the threads are done. Pressing the space clears the vector okay but no memory change... Quote Link to comment Share on other sites More sharing options...
klepto2 Posted January 18, 2023 Share Posted January 18, 2023 This is how i have done it with the ShaderWatcher: void UltraEngine::Utilities::Shader::ShaderWatcher::RunPreprocess(vector<shared_ptr<ShaderFile>> files) { int time = Millisecs(); Print("Preprocessing... (" + WString(files.size()) + " Shaders)"); vector<shared_ptr<Thread>> threads; for (auto f : files) { threads.push_back(CreateThread(bind(ThreadPreprocess,_compiler, f), true)); } for (auto t : threads) t->Wait(); Print("Preprocessing finished... (" + WString(Millisecs() - time) + "ms)"); } A semaphore or mutex isn't needed here as there are no resources shared by any thread. A mutex is a good way to sync access to specific functions which are not threadsafe. eg: Print. Semaphores (technically a Mutex is just a specialized Version of a semaphore) can be used for syncing as well, but also to limit the amount of maximum parallel threads used for execution. 1 Quote Windows 10 Pro 64-Bit-Version NVIDIA Geforce 1080 TI Link to comment Share on other sites More sharing options...
Josh Posted January 18, 2023 Share Posted January 18, 2023 Vectors normally don't reduce their memory when they are resized. You can check their capacity to verify this. But compared to 1.6 GB (most of which is the VS debugger) the amount of memory you are freeing here is tiny. GetMemoryUsage will give you the exact number, only in debug mode. 1 Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
SpiderPig Posted January 18, 2023 Author Share Posted January 18, 2023 I tested the capacity for both clear() and erase(). Both times it remains at 10,000, and memory usage actually went up a little. How then to properly destroy all memory within a vector? if (window->KeyHit(KEY_SPACE)) { Print(GetMemoryUsage()); threads.erase(threads.begin(), threads.end()); Print(GetMemoryUsage()); Print(threads.capacity()); } Quote Link to comment Share on other sites More sharing options...
SpiderPig Posted January 18, 2023 Author Share Posted January 18, 2023 I think shrink_to_fit() does it. Quote Link to comment Share on other sites More sharing options...
Josh Posted January 18, 2023 Share Posted January 18, 2023 The reason vectors do this is because if you have a vector that is constantly resizing, its faster to just leave the capacity at its max value, because vector resize can be slow. If you are constantly pushing new objects into it, each added item will require a new memory block to be allocated and copied, so it's usually best to just leave it as-is. In fact, this is what the reserve() method is for: std::vector<int> v; v.reserve(1000); for (int n = 0; n < 1000; ++n) { v.push_back(n); } 1 Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
Josh Posted January 18, 2023 Share Posted January 18, 2023 Another thing I sometimes do to avoid constantly resizing: if (v.capacity() == v.size()) v.reserve(v.size() * 1.3) v.push_back(i); This will make it so memory allocations only happen "once in a while" instead of constantly allocating and recopying the buffer. 1 Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
SpiderPig Posted January 18, 2023 Author Share Posted January 18, 2023 Yeah since learning of the reserve function it has been my best friend to date. I've been programming in C++ for nearly 20 years and I swear there is still so much to learn. Against what I thought would happen, the below code is actually 4 times slower than the first example I posted. It seems the constant back and forth with the semaphore waiting and signalling takes a toll. It took about 40 seconds for the threads to finish. (Debug mode) A fair comparison would be a good idea. I was only using 4 threads when the other tests used 8! Changing that made it 2nd best. It took 22 seconds to complete. #include "UltraEngine.h" #include "ComponentSystem.h" using namespace UltraEngine; void DoThis() { auto r = 0.0f; for (int i = 0; i < 1000; i++) { r += sqrt(Random(2.0f, 1024.0f)); } } struct ThreadManager : public Object { int execution_count = 0; bool exit = false, waiting = false; function<void()> my_func; shared_ptr<Semaphore> semaphore; }; shared_ptr<Object> RunThread(shared_ptr<Object> extra) { auto manager = extra->As<ThreadManager>(); while (manager->exit == false) { manager->waiting = false; manager->execution_count++; if (manager->my_func != nullptr) { manager->my_func(); } manager->waiting = true;//a state for a semaphore maybe? manager->semaphore->Wait(); } return nullptr; } int main(int argc, const char* argv[]) { auto displays = GetDisplays(); auto window = CreateWindow("Ultra Engine", 0, 0, 1280, 720, displays[0], WINDOW_CENTER | WINDOW_TITLEBAR); auto world = CreateWorld(); auto framebuffer = CreateFramebuffer(window); auto camera = CreateCamera(world); camera->SetClearColor(0.125); camera->SetFov(70); camera->SetPosition(0, 0, -3); auto light = CreateDirectionalLight(world); light->SetRotation(35, 45, 0); light->SetRange(-10, 10); auto box = CreateBox(world); box->SetColor(0, 0, 1); auto actor = CreateActor(box); auto component = actor->AddComponent<Mover>(); component->rotation.y = 45; struct ThreadStruct { shared_ptr<Thread> thread; shared_ptr<ThreadManager> manager; }; auto available_threads = MaxThreads(); vector<shared_ptr<ThreadStruct>> threads; threads.reserve(available_threads); for (int id = 0; id < available_threads; id++) { auto s = make_shared<ThreadStruct>(); s->manager = make_shared<ThreadManager>(); s->manager->semaphore = CreateSemaphore(); s->thread = CreateThread(RunThread, s->manager); threads.push_back(s); } bool done = false; while (window->Closed() == false and window->KeyDown(KEY_ESCAPE) == false) { auto tn = 0; for (auto t : threads) { if (t->manager->waiting == true && t->manager->exit == false) { if (t->manager->execution_count < 1250) { t->manager->semaphore->Signal(); } else { t->manager->exit = true; t->manager->semaphore->Signal(); Print("Done Thread #" + String(tn)); tn++; } } } world->Update(); world->Render(framebuffer); } return 0; } Doing it like this took 25 seconds to finish. vector<shared_ptr<Thread>> threads; threads.reserve(10000); for (int id = 0; id < 10000; id++) { threads.push_back(CreateThread(RunThread)); } for (auto t : threads) { t->Start(); } And the fastest is the way @klepto2 has been doing it and how I've been doing it to date. Finishes in about 10 seconds if you allow the threads to start straight away. Probably because the first few threads can start while the others are still being created. vector<shared_ptr<Thread>> threads; threads.reserve(10000); for (int id = 0; id < 10000; id++) { threads.push_back(CreateThread(RunThread)); } I wonder if a semaphore should have a state than can be checked? manager->waiting = true;//a state for a semaphore maybe? ... semaphore->GetState() Quote Link to comment Share on other sites More sharing options...
Josh Posted January 18, 2023 Share Posted January 18, 2023 You are reading from and writing to several different variables in different threads, so the values you read could be totally random. Normally the way you would use these is the main thread has a semaphore that says "new work is ready" and the thread would have a semaphore that says "work is finished". This is based on the idea that the main thread has some point at which the work must be completed, and it will wait until the thread is finished. Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
Josh Posted January 18, 2023 Share Posted January 18, 2023 If it's a thread that just runs until it is finished, while the main loop continues, you can use a mutex lock for that to change a variable that says "the results are ready". 1 Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
SpiderPig Posted January 18, 2023 Author Share Posted January 18, 2023 3 minutes ago, Josh said: You are reading from and writing to several different variables in different threads, so the values you read could be totally random. Yeah I didn't think that was a good idea. I'll try a mutex. I think I'm beginning to understand threads now. Quote Link to comment Share on other sites More sharing options...
SpiderPig Posted January 18, 2023 Author Share Posted January 18, 2023 I think the best thing to do in my case might be to create a vector of data to pass to the thread once as it is created. That way I can not use too many mutex. 😁 Quote Link to comment Share on other sites More sharing options...
Josh Posted January 18, 2023 Share Posted January 18, 2023 I would do this by creating a custom class derived from Object that stores the vector in a member, then passing that object as the extra parameter in the CreateThread function. 1 Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
Solution SpiderPig Posted January 18, 2023 Author Solution Share Posted January 18, 2023 Thanks that certainly is the way to get the most out of the threads! With this it completed in less than one second. I had to make sure it was actually still processing the data is was that quick. Pretty sure it's still doing the same workload. I can now speed up my voxel terrain by a lot! #include "UltraEngine.h" #include "ComponentSystem.h" using namespace UltraEngine; float DoThis() { auto r = 0.0f; for (int i = 0; i < 1000; i++) { r += sqrt(Random(2.0f, 1024.0f)); } return r; } struct ThreadManager : public Object { int current_count = 0, total_count = 1250;//10,000 / 8 (I.e. 10,000 / MaxThreads()) float result = 0.0f; function<float()> my_func; }; shared_ptr<Object> RunThread(shared_ptr<Object> extra) { auto manager = extra->As<ThreadManager>(); while (manager->current_count < manager->total_count) { if (manager->my_func != nullptr) { manager->result = manager->my_func(); } manager->current_count++; } return nullptr; } int main(int argc, const char* argv[]) { auto displays = GetDisplays(); auto window = CreateWindow("Ultra Engine", 0, 0, 1280, 720, displays[0], WINDOW_CENTER | WINDOW_TITLEBAR); auto world = CreateWorld(); auto framebuffer = CreateFramebuffer(window); auto camera = CreateCamera(world); camera->SetClearColor(0.125); camera->SetFov(70); camera->SetPosition(0, 0, -3); auto light = CreateDirectionalLight(world); light->SetRotation(35, 45, 0); light->SetRange(-10, 10); auto box = CreateBox(world); box->SetColor(0, 0, 1); auto actor = CreateActor(box); auto component = actor->AddComponent<Mover>(); component->rotation.y = 45; struct ThreadStruct { shared_ptr<Thread> thread; shared_ptr<ThreadManager> manager; }; auto available_threads = MaxThreads(); vector<shared_ptr<ThreadStruct>> threads; threads.reserve(available_threads); for (int id = 0; id < available_threads; id++) { auto s = make_shared<ThreadStruct>(); s->manager = make_shared<ThreadManager>(); s->manager->my_func = DoThis; s->thread = CreateThread(RunThread, s->manager); threads.push_back(s); } int thread_index = 0; while (window->Closed() == false and window->KeyDown(KEY_ESCAPE) == false) { for (int id = 0; id < threads.size(); id++) { if (threads[id]->thread->GetState() == THREAD_FINISHED) { Print("Done Thread #" + String(thread_index) + " - " + String(threads[id]->manager->result)); thread_index++; threads.erase(threads.begin() + id); id--; } } world->Update(); world->Render(framebuffer); } return 0; } 1 Quote Link to comment Share on other sites More sharing options...
SpiderPig Posted January 19, 2023 Author Share Posted January 19, 2023 Do you require a mutex if your just reading memory created on a different thread or is it just for writing to that memory? E.g. I create an octree on the main thread and then pass one of it's nodes (via smart pointer) to a thread where it and it's children are read in order to create vertices and indices in said thread which are then passed back to the main thread once it is done. Can I create a new child node in the thread without a mutex if the parent node I passed to the thread is not being used in any other thread? Or is it simply a rule. Do not access (read or write) memory that is shared between threads (that has the potential to be accessed at the same time) in any circumstance. Quote Link to comment Share on other sites More sharing options...
SpiderPig Posted January 19, 2023 Author Share Posted January 19, 2023 Not sure if this answer is a good one or not. https://stackoverflow.com/questions/1525189/do-i-need-a-mutex-for-reading Quote Link to comment Share on other sites More sharing options...
klepto2 Posted January 19, 2023 Share Posted January 19, 2023 Normally, I would use a mutex for writing and reading. Sample: With only read mutex: Thread A : Writes to node x the value 1 --> Just begins writing Thread B : Locks the Mutex and reads the value 0 and unlocks the mutex --> Thread A hasn't finished writing the 1 into the memory Thread A : Finishes Thread C : Locks the Mutex and reads the value 1 --> Thread A has finished writing the 1 into the memory and unlocks the mutex The read results might get out of sync. With read and write mutex: Thread A : Locks the mutex and writes to node x the value 1 --> Just begins writing Thread B : Waits for the unlocking of the mutex Thread A : Unlocks the Mutex: --> Finished writing Thread B : Locks the mutex and Reads the value 1 from memory and unlocks the mutex afterwards Thread C : Locks the Mutex and reads the value 1 from memory and unlocks the mutex afterwards The results are always in sync. The read and write approach is of course much slower then just locking the read. You need to make the mutex locks as small as possible and maybe optimize them to only lock when it is really necessary. Quote Windows 10 Pro 64-Bit-Version NVIDIA Geforce 1080 TI Link to comment Share on other sites More sharing options...
klepto2 Posted January 19, 2023 Share Posted January 19, 2023 Small addition: This might not be the case for the int values in this case, they are just used for simplicity. int operations are atomic, and should work, without using a lock for reading. More complex objects of course can have other behavior and may need read and write mutex or other types of memory barriers. Quote Windows 10 Pro 64-Bit-Version NVIDIA Geforce 1080 TI Link to comment Share on other sites More sharing options...
SpiderPig Posted January 19, 2023 Author Share Posted January 19, 2023 Yeah I was thinking it might just be safer to mutex the lot. I'll probably end up making a system that creates all it needs in the thread and then passes the whole thing back to place into the octree. 1 minute ago, klepto2 said: Small addition: This might not be the case for the int values in this case, they are just used for simplicity. int operations are atomic, and should work, without using a lock for reading. More complex objects of course can have other behavior and may need read and write mutex or other types of memory barriers. I was just reading something similar here; https://www.quora.com/Do-I-have-to-use-a-mutex-to-protect-shared-variables-that-I-use-for-read-only-purposes-without-in-place-modification-in-C++-multithreading Right now I'm implementing a bool check in the thread to see if it should quite or not. I figured I might not need a mutex for that. Quote Link to comment Share on other sites More sharing options...
SpiderPig Posted January 20, 2023 Author Share Posted January 20, 2023 How many threads are okay for me to use? MaxThreads() returns 8 but something tells me I shouldn't tie all those up with work in case Ultra needs one or two at some point? I'm just going off watching the output and seeing Ultra close threads every now and then. Usually at start-up. Quote Link to comment Share on other sites More sharing options...
Josh Posted January 20, 2023 Share Posted January 20, 2023 You may be using a lot of threads but you probably aren't using them at max capacity. If you check CPU usage you'll probably be surprised how little they are being used. Ultra uses one high-priority thread for rendering. The main logic thread where your code executes pauses in intervals of 16 milliseconds, so CPU usage should be pretty low. Culling is on another thread, but usage there will be low also. Animation, physics, and navmesh building are each on separate threads, and the animation system may use many threads, but in most cases CPU usage on each will be low unless you are pushing that system. I'd say MaxThreads() - 1 is a good general rule. 1 Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.