std::filesystem::create_directories returns false but no error code if the path already exists.
Hence no longer saving was possible as soon as the path exists.
Added check for the error code (ec) to be non-zero for the error message
Consider this program:
#include <cstdio>
int main(void) {
const char *filename = u8"ディセント3.txt";
auto fp = std::fopen(filename, "r");
if (fp) {
std::fclose(fp);
return 0;
} else {
return 1;
};
}
If a file named ディセント3.txt exists, then will that program
successfully open it? The answer is: it depends.
filename is going to point to these bytes:
Raw bytes: e3 83 87 e3 82 a3 e3 82 bb e3 83 b3 e3 83 88 33 2e 74 78 74 00
Characters: ディセント3.txt␀
Internally, Windows uses UTF-16. When you call fopen(), Windows will
convert the filename parameter into UTF-16 [1]. If the program is run
with a UTF-8 Windows code page, then the above bytes will be correctly
interpreted as UTF-8 when being converted into UTF-16 [2]. The final
UTF-16 string will be this*:
Raw bytes: ff fe c7 30 a3 30 bb 30 f3 30 c8 30 33 00 2e 00 74 00 78 00 74 00
Characters: ディセント3.txt
On the other hand, if the program is run with code page 932, then the
original bytes will be incorrectly interpreted as code page 932 when
being converted into UTF-16. The final UTF-16 string will be this*:
Raw bytes: ff fe 5d 7e fd ff 67 7e 63 ff 67 7e 7b ff 5d 7e 73 ff 5d 7e fd ff 33 00 2e 00 74 00 78 00 74 00
Characters: 繝�繧」繧サ繝ウ繝�3.txt
In other words, if that program gets compiled on Windows with a UTF-8
execution character set, then it needs to be run with a UTF-8 Windows
code page. Otherwise, mojibake might happen.
*Unlike the first string, this one does not have a null terminator. This
is because the Windows kernel doesn’t use null terminated strings for
paths [3][4].
---
Before this commit, Descent 3 would pass UTF-8 to fopen(), even if
Descent 3 is run with a non-UTF-8 Windows code page [5]. This commit
makes sure that Descent 3 gets run with a UTF-8 Windows code page.
The Windows code page isn’t just used by fopen(). It also gets used by
many other functions in the Windows API [6]. I don’t know if Descent 3
uses any of those other functions, but if it does, then this commit will
also help make sure that those functions receive strings with the
correct character encoding. Descent 3 uses UTF-8 for strings by default
[7]. Making sure that Descent 3 uses UTF-8 everywhere will make
encoding-related mistakes less likely in the future.
Fixes#483.
[1]: <https://stackoverflow.com/a/7950569/7593853>
[2]: <https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/fopen-wfopen?view=msvc-170#remarks>
[3]: <https://stackoverflow.com/a/52372115/7593853>
[4]: <https://googleprojectzero.blogspot.com/2016/02/the-definitive-guide-on-win32-to-nt.html>
[5]: <https://github.com/DescentDevelopers/Descent3/pull/475#discussion_r1664888080>
[6]: <https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page#-a-vs--w-apis>
[7]: adf58eca (Explicitly declare execution character set, 2024-07-07)
Consider this program:
#include <cstdio>
int main(void) {
const char *example_string = "ディセント3";
for (size_t i = 0; example_string[i] != '\0'; ++i) {
printf("%02hhx ", example_string[i]);
}
puts("");
return 0;
}
What will that program output? The answer is: it depends. If that
program is compiled with a UTF-8 execution character set, then it will
print this:
e3 83 87 e3 82 a3 e3 82 bb e3 83 b3 e3 83 88 33
If that program is compiled with a Shift JIS execution character set,
then it will print this:
83 66 83 42 83 5a 83 93 83 67 33
This is especially a problem when using MSVC. MSVC doesn’t necessarily
default to using UTF-8 as a program’s execution character set [1].
---
Before this change, Descent 3 would use whatever the default execution
character set was. This commit ensures that the execution character set
is UTF-8 as long as Descent 3 gets compiled with MSVC, GCC or Clang. If
Descent 3 is compiled with a different compiler, then a different
execution character set might get used, but as far as I know, we only
support MSVC, GCC and Clang.
I’m not sure whether or not this change has any noticeable effects. If
using different execution character sets do have noticeable effects,
then this change will hopefully ensure that those effects are the same
for everyone.
[1]: <https://learn.microsoft.com/en-us/answers/questions/1805730/what-is-msvc-s-default-execution-character-set>
Before this change, there was a chance that a text editor or compiler
would use the wrong character encoding. For text editors, this commit
adds “charset = utf-8” to .editorconfig. That will cause editors that
support EditorConfig files [1] to automatically use UTF-8 when reading
and writing files. For compilers, this commit adds compiler options that
guarantee that compilers will decode source code files using UTF-8. The
compiler options are known to work on MSVC, GCC and Clang. If we ever
want to support additional compilers, then we might have to edit the if
statement that this commit adds.
This commit does not eliminate the chance that a wrong character
encoding will be used. Someone could always use a text editor that
doesn’t support EditorConfig files or a compiler that doesn’t support
the compiler options that we use. This commit does, however, make an
encoding mismatch much less likely.
[1]: <https://editorconfig.org/#pre-installed>
Inline the only usage of `loki_getprefpath()`, and use `Base_directory` (controlled by `-setdir`) instead of `loki_getdatapath()`.
Inline or eliminate some other code that became empty/unused with loki removal.