9.0b9 x64 - wincmd.ini encoding issues

mag · Post by *mag » 2016-08-11, 15:32 UTC

I've got in wincmd.ini

[Configuration]
...
DrivesExportUpcase=1
DrivesShowUpcase=1
...

And since at least 9.0b9 it doesn't work anymore, shown/copied drive letters are always lowercase.

EDIT: Turned out to be caused by the wincmd.ini being UTF-8 encoded, see my posts below. If the wincmd.ini is UTF-16 LE encoded then it's possible to avoid the whole issue.

So the question is now regarding the wincmd.ini encoding - does it have to be UTF-16 LE if it contains some special national characters, or is it a bug in tcmd when it has issues with UTF-8 encoding?

Lefteous · Post by *Lefteous » 2016-08-11, 15:44 UTC

Not confirmed

Dalai · Post by *Dalai » 2016-08-11, 15:48 UTC

Not confirmed either. Are you sure you checked the correct wincmd.ini used by TC?

Regards
Dalai

Horst.Epp · Post by *Horst.Epp » 2016-08-11, 15:54 UTC

Not confirmed for TC 9.0b9 x64 and x86 under Windows 10

mag · Post by *mag » 2016-08-11, 16:27 UTC

It behaves really strange here.
I actually wanted to test:

ShowHiddenSystemOverlay
ShowHiddenDimmed

So I used "Configuration / Change Settings Files Directly" to edit the wincmd.ini and add these options there.
After relaunching tcmd I noticed several display settings were reset (such as show hidden files that I had enabled prior to that change was now disabled) which was strange. However I re-enabled that stuff again.

However it seems that beside the originally reported issue also those 2 options ShowHiddenSystemOverlay, ShowHiddenDimmed don't have any effect at all, so there must be something more common wrong.

If I change some setting via GUI and then check the wincmd.ini via "Configuration / Change Settings Files Directly" the related configuration change is visible there, so it definitely seems to use that file. Also I haven't found another one anywhere on the system disk.

Note: I've got the ini file location set to "Application data" and its actual location is C:\Users\<username>\AppData\Roaming\GHISLER\wincmd.ini

After deleting the whole wincmd.ini and thus resetting the whole configuration it started to behave properly - all those options work if I add them there again. It seems that something that was already there was causing issues, I'll try to find out more.

mag · Post by *mag » 2016-08-11, 17:37 UTC

After thoroughly checking the wincmd.ini I found several duplicate sections and entries in there. That's actually happened to me in the past already and it was due to some special character being present somewhere... I guess it was due to similar reason this time as well but now I'm unable to find out the exact cause.

So I manually cleaned up wincmd.ini and now everything works as expected.

mag · Post by *mag » 2016-08-11, 18:12 UTC

Alright I've found the culprit.

in wincmd.ini I've got the following text in Cyrillic:

[RenameSearchFind]
0=скан

The wincmd.ini is stored in UTF-8 and that screws things up when I change anything in the configuration (either via GUI or via "Configuration / Change Settings Files Directly"). Tcmd will choke on the above text when it's in UTF-8 and will write 2nd [Configuration] section in the wincmd.ini and will start to use that one (and not actually fully, some options still seem to be taken from the 1st [Configuration] section).

Note that tcmd will still keep the file in UTF-8 so even after adding the 2nd [Configuration] section the file will still contain the cause of the problem.

If I convert the wincmd.ini to UTF-16 LE then the problem doesn't occur.

Dalai · Post by *Dalai » 2016-08-11, 18:59 UTC

TC uses WinAPI functions to read and write wincmd.ini. These API functions support ANSI and Unicode (UTF-16) only. UTF-8 is not supported! TC stores items in the sections encoded separately by prefixing a BOM (Byte Order Mark) if required.

There have been several reports of wincmd.ini suddenly becoming UTF-8 encoded. I don't know if the reason/cause for this has been detected yet.

Regards
Dalai

milo1012 · Post by *milo1012 » 2016-08-11, 19:34 UTC

Dalai wrote:There have been several reports of wincmd.ini suddenly becoming UTF-8 encoded. I don't know if the reason/cause for this has been detected yet.

I can't remember any report about the ini being converted to UTF-8 automatically, i.e. w/o user interaction. All reports were due to users manually converting the ini to UTF-8.
My guess is:
Some text editors seem to detect the ini file as UTF-8 when opening it*; and when users save them after doing some changes, it might therefore be recoded in the wrong way.

So I think it's time to make some sticky thread in this forum, or an explicit warning in the TC help file, to inform users that they should not recode the file to UTF-8 under any circumstances.

* This is the case when the ini doesn't contain any standalone ANSI characters > 0x7f, but at least one non-codepage (Unicode) entry encoded as UTF-8 byte sequence with a prefixed BOM

Dalai · Post by *Dalai » 2016-08-11, 19:54 UTC

2milo1012
Ah, yes, that could be the case. Hadn't thought of the text editors.

mag · Post by *mag » 2016-08-11, 23:41 UTC

I've done some more tests:

- delete wincmd.ini, start tcmd, search (Alt+F7) for "скан" so that it's stored into the new wincmd.ini, check the result in wincmd.ini: UTF-8 without BOM and tcmd doesn't have any problem with that file

- edit it in Windows (10 Anniversary) Notepad (simply "Configuration / Change Settings Files Directly" will open it in Notepad by default), save it (if you use "Save As" you may verify that it saves it in UTF-8), check the result: UTF-8 with BOM (EF BB BF hex) and tcmd has those above mentioned issues with that file

- convert it to UTF-16 LE (either via Windows Notepad "Save As" with "Encoding: Unicode" selected) or any other way, check the result: UTF-16 LE with BOM (FF FE 5B 00 hex) and tcmd has no problem with that file

So:
- tcmd itself uses UTF-8, but without BOM

- if we edit it in external editor that adds BOM to the UTF-8 file, tcmd will choke on that

- Windows Notepad (which is used for "Configuration / Change Settings Files Directly" by default) does exactly that, the easiest way to work around that is to convert it to UTF-16 LE (maybe BE would work as well - I haven't bothered with that) since then almost all text editors should save it properly because BOM is usually used in UTF-16 encoded files while just rarely in UTF-8 ones.

Fixing the tcmd so that it would work with wincmd.ini encoded in UTF-8 with BOM would be welcome, though I don't know how difficult it would be and whether it's possible.

milo1012 · Post by *milo1012 » 2016-08-12, 00:46 UTC

2mag
No, it is just as Dalai explained: TC requests the ini keys from the Win API, and this will only support either ANSI or UTF-16 (LE). The API functions will detect if the file encoding is UTF-16, otherwise it will treat the file as ANSI - nothing else is possible.
This means that TC does not use UTF-8 at any place for the overall file encoding, but:
It will encode individual key values to UTF-8 with a prefixed BOM, because otherwise this information will be lost when dealing with an ANSI encoded ini file (ANSI plain / code page is limited - you wouldn't able to store characters outside your local code page when the ini file is ANSI). So these byte sequences are the cause that text editors might detect the file as UTF-8 when not much else is in the ini, but this is not intended.
This explains all your findings.

You can cross check it by doing the following:
Start a search (Alt+F7) with a string that consists of characters from your local (system) code page with values above 127 (above the ASCII characters) only, but nothing else (no characters outside your code page). So e.g. on a system with page 1252 (Western European) you could use some Umlauts (öäü). TC will save this string to the ini file in ANSI byte encoding, not UTF-8. Now do another search with a string consisting of characters outside your local page, like your former example: TC will save that string to the ini file in UTF-8 byte encoding and prefixed BOM, but only this string, the 1st string is untouched! When you now open the ini file in a decent text editor it will detect it as ANSI, because the 1st (ANSI) string will form an invalid byte sequence for a UTF-8 encoding - just like intended.

mag · Post by *mag » 2016-08-12, 13:38 UTC

Yes it's more or less like that, but that's even worse actually. That means you can easily end up with a file that uses multiple character encodings at the same time and probably no editor will handle that properly. In such case tcmd should really try to enforce the whole file to be encoded in UTF-16 LE.

MVV · Post by *MVV » 2016-08-12, 14:41 UTC

It is correct that TC uses Windows API that only support ANSI and UTF-16, UTF-8 is not supported, but TC may store some strings in UTF-8 with personal BOMs (and these BOMs may tell editors that file has UTF-8, but you will not see these BOMs because BOM has no visible representation).

I can add that second [Configuration] section may appear because of BOM at the beginning of the file. Windows API expect that section names must be at the beginning of lines but in case of UTF-8 with BOM first file line starts with BOM, i.e. it is <BOM>[Configuration] instead of just [Configuration], so API doesn't detect the section.

I think that it is a bad idea nowadays to open files in Windows Notepad by default because of mentioned reasons (it doesn't allow selecting input encoding and adds a BOM at the beginning of file if detects its encoding as UTF-8). It seems that only editors that allow selecting input encoding may be used for editing configuration files...

Horst.Epp · Post by *Horst.Epp » 2016-08-12, 15:46 UTC

MVV wrote:It is correct that TC uses Windows API that only support ANSI and UTF-16, UTF-8 is not supported, but TC may store some strings in UTF-8 with personal BOMs (and these BOMs may tell editors that file has UTF-8, but you will not see these BOMs because BOM has no visible representation).

I can add that second [Configuration] section may appear because of BOM at the beginning of the file. Windows API expect that section names must be at the beginning of lines but in case of UTF-8 with BOM first file line starts with BOM, i.e. it is <BOM>[Configuration] instead of just [Configuration], so API doesn't detect the section.

I think that it is a bad idea nowadays to open files in Windows Notepad by default because of mentioned reasons (it doesn't allow selecting input encoding and adds a BOM at the beginning of file if detects its encoding as UTF-8). It seems that only editors that allow selecting input encoding may be used for editing configuration files...

Thats the reason why I use NotepadReplacer to get Syn2 editor instead.
This works even for harcoded Notepad calls.
https://www.binaryfortress.com/NotepadReplacer/