Josh Posted September 18, 2017 Share Posted September 18, 2017 I have started switching the engine over to unicode by replacing all occurances of std:;string with std::wstring. There are a bunch of little functions and variables that have to be changed (char to wchar_t, "" to L"", etc.) but it is pretty straightforward. Lua 5.3 supposedly supports unicode strings but the manual states that the lua_getglobal() function accepts a char* parameter:https://www.lua.org/manual/5.3/manual.html#lua_getglobal There's a little information here but it is not very clear:https://www.lua.org/manual/5.3/manual.html#6.5 So how are you supposed to make unicode work in Lua? Switching data back and forth between wstrings and strings is a recipe for disaster. Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
Josh Posted September 18, 2017 Author Share Posted September 18, 2017 More confusion. Apparently std::string supports UTF-8 in C++11: std::string msg = u8"महसुस"; So I guess Leadwerks already supports unicode and my job is done? Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
Josh Posted September 18, 2017 Author Share Posted September 18, 2017 I believe this code will successfully open a weird-character file on any platform: std::string filename = u8"⺹.txt"; #ifdef _WIN32 std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> converter; auto f = _wfopen(converter.from_bytes(filename).c_str(), L"rb"); #else auto f = _fopen(filename.c_str(), "rb"); #endif Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
Roland Posted September 18, 2017 Share Posted September 18, 2017 Why not just use wstring, wchar_t, wfstream etc .. and that's it. A search and replace and maybe some modification here and there and its done Quote Roland Strålberg Website: https://rstralberg.com Link to comment Share on other sites More sharing options...
Josh Posted September 19, 2017 Author Share Posted September 19, 2017 Unicode sucks because it uses a variable character size. This makes search and replace operations very difficult. However, Linux does not accept wstrings in commands like fopen. At this point I am thinking we will store strings as wstrings and then convert to UTF-8 std::strings when calling Linux system commands. Why is everything in Linux designed as if computers have one kb memory? The whole unicode design is idiotic. They made a very complicated system when all they had to do was use 2 bytes per character and have one number for every character. I guess making something that actually works would be "boring". Yes, I know there are ancient vietnamese characters that are no longer in use that push the character count past 65,000 but who cares about that? Why should be handicap modern computing for a bunch of vietnamese people who died three centuries ago? They're dead so they don't care, and if they had anything interesting to say it would have been made into a movie already. Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
Roland Posted September 19, 2017 Share Posted September 19, 2017 Aaah. I see. Thank's (y) Quote Roland Strålberg Website: https://rstralberg.com Link to comment Share on other sites More sharing options...
Josh Posted September 19, 2017 Author Share Posted September 19, 2017 I got a window created with chinese characters but I can't print them out to the console: wprintf(L"%ls \n", L"A wide string"); wprintf(L"%ls \n", L"勝遂記暮恐村日性周報著身催"); wprintf(L"Why? 为什么?\n"); Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
Josh Posted September 19, 2017 Author Share Posted September 19, 2017 Also fails: DWORD i; WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), L"勝遂記暮恐村日性周報著身催\n", 14, &i, NULL); Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
Josh Posted September 19, 2017 Author Share Posted September 19, 2017 I'm thinking my console font probably just cannot display the characters. I tried to write a wstring to a text file but that didn't work out too well either when I opened it in Notepad++. Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
Josh Posted September 19, 2017 Author Share Posted September 19, 2017 Okay, I discovered if you want to write a wstring (utf-16) text file you have to first write an unsigned short integer 65279 to the file. Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
Josh Posted September 19, 2017 Author Share Posted September 19, 2017 Sadly, Leadwerks 5 will not support 15th century Vietnamese computers. 1 Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
Einlander Posted September 19, 2017 Share Posted September 19, 2017 I found this blog post a few days ago through reddit to be insightful. https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/ It made me realise everything other than utf-16 is basically a beautiful hack. it also speaks to the about wcs functions in c++ 2 Quote Link to comment Share on other sites More sharing options...
Josh Posted September 19, 2017 Author Share Posted September 19, 2017 1 hour ago, Einlander said: I found this blog post a few days ago through reddit to be insightful. https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/ It made me realise everything other than utf-16 is basically a beautiful hack. it also speaks to the about wcs functions in c++ Thanks for the info. You're right, everything but UCS-2 (two byte) Unicode is a stupid idea because it means you are translating text through two layers of conversion. (The fact that some characters no one uses go beyond the 65,000 character limit does not matter.) So in Leadwerks we will replace all strings with wstring, replace all Windows API calls with Windows API -W, and for Lua or Linux system calls we convert the wstring to UTF-8 (for opening files, etc.). Strings will be stored in files as UCS-2. It is interesting to see that all the tech enthusiasts keep claiming UTF-8 is the best but people who actually write software use UTF-16:https://stackoverflow.com/questions/6162484/why-does-modern-perl-avoid-utf-8-by-default Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
AggrorJorn Posted September 19, 2017 Share Posted September 19, 2017 3 hours ago, Josh said: Sadly, Leadwerks 5 will not support 15th century Vietnamese computers. Dammit, there goes the vast majority of my target audience... 1 Quote Link to comment Share on other sites More sharing options...
Josh Posted September 20, 2017 Author Share Posted September 20, 2017 This is getting very complicated and I am reconsidering this. Why do we need Russian and Chinese characters? Loading or saving a file. Drawing text on the screen or in a GUI element. Storing a variable for one of the above two purposes. Do we really need to change every other string in Leadwerks in order to accommodate these goals, or can we simply add overloads for a few commands and use std::wstring for internal file path values? Do we care if the user can name an entity "汽车" in the editor, or should they be expected to use latin characters for something like this? I don't know if Lua 5.3 will really support unicode strings. I don't know if the Steamworks commands use unicode at all. They all just accept a char* value. I don't know if these will be stored the same way on Windows and Linux. I still have 2991 errors in the engine to fix. At first I thought we should change every single variable but now I am not sure if that is a good idea. I could just add a few commands like this and be done with it: Widget::SetText(const std::string& text) Widget::SetText(const std::wstring& text) Context::DrawText(const std::string& text) Context::DrawText(const std::wstring& text) shared_ptr<Model> LoadModel(const std::string& path) shared_ptr<Model> LoadModel(const std::wstring& path) However, this means potentially a mix of std::string and std::wstring values will be present in the engine. Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
Einlander Posted September 20, 2017 Share Posted September 20, 2017 (edited) I would make sure that the rest of the engine works with utf8 and let lua itself fail with the encoding. 5.3 has utf8 support https://www.lua.org/manual/5.3/manual.html#6.5 but it's not very robust. Since it is still early, you do have the option to choose Lua derived language or a completely different language not based on Lua. As distasteful as it may be the API is changing, the scripts will need to be updated and it might be simpler to start over early with something else. Who knows. Edited September 20, 2017 by Einlander Quote Link to comment Share on other sites More sharing options...
Josh Posted September 21, 2017 Author Share Posted September 21, 2017 13 hours ago, Einlander said: I would make sure that the rest of the engine works with utf8 and let lua itself fail with the encoding. 5.3 has utf8 support https://www.lua.org/manual/5.3/manual.html#6.5 but it's not very robust. Since it is still early, you do have the option to choose Lua derived language or a completely different language not based on Lua. As distasteful as it may be the API is changing, the scripts will need to be updated and it might be simpler to start over early with something else. Who knows. Then say goodbye to String::Split(), Lower(), Upper(), Mid() and all other string manipulation commands, and your file paths will have to be 100% exact or files will fail to load. UTF-8 is a fraud and its proponents should be imprisoned for crimes against humanity. Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
Einlander Posted September 21, 2017 Share Posted September 21, 2017 Hey now, I like utf8, I have just never had to deal with coding anything Unicode on Linux. All os's could have conflicting implementations. Is there a bsd/public domain lib that handles Unicode ? 1 Quote Link to comment Share on other sites More sharing options...
Josh Posted September 21, 2017 Author Share Posted September 21, 2017 56 minutes ago, Einlander said: Hey now, I like utf8, I have just never had to deal with coding anything Unicode on Linux. All os's could have conflicting implementations. Is there a bsd/public domain lib that handles Unicode ? Haha, yeah that is the catch. It's basically a compressed format so traversing it is impossible. Quote My job is to make tools you love, with the features you want, and performance you can't live without. Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.