Yeah, well, that's the issue here. Example: pluginst.inf contains translations to several languages, Chinese among them. Notepad++ automatically detects GB2312 for this file when opening it. Assume I add some strings. Aren't these saved with this specific character set/encoding? Also, I send the file to the translators, and they add new translations or correct existing ones.
You should keep in mind that such multi-language files require to switch encoding multiple times for editing multiple languages, and your editor will show properly only a part of lines and then convert characters into bytes according to the selected encoding.
Rephrased: What happens to the strings already in the file when it's opened and saved with a different character set on a different system? Is it possible for them to get broken? My thought was to avoid that by using Unicode files.
If you edit only some lines, other lines remain unchanged, of course if your editor will not damage them while saving due to characters that may be incorrect in selected encoding.
That doesn't work for all encodings. I have Cyrillic available, but not Chinese, so there's no way for me to copy'n'paste these translations.
We don't need to check for all encodings, INI files may only be ANSI (local Windows codepage) and Unicode (UTF-16LE with or without BOM), and zero second byte (in case of missing BOM) means that the file is in Unicode, so it may be safely read as Unicode using GetPrivateProfileStringW
, otherwise it should be read using GetPrivateProfileStringA
and converted into Unicode using different codepage numbers for strings in different languages as TC currently do.
This seems to have worked quite well, although I can't be sure because I neither know enough about encoding nor the languages ...
I usually keep codepage numbers of plugin translations in comments, so I don't need to remember which codepage should be used for a translation, it only needs to discover language codepage once when a new language is being added. But happily I have no WDX LNG multi-codepage files yet, so I didn't even noticed that TC still reads Unicode INI files using ANSI function and may lost Unicode characters.