Unicode comments via [=tc.comment] displays empty symbol

Bug reports will be moved here when the described bug has been fixed

Moderators: sheep, Hacker, Stefan2, white

Post Reply
User avatar
D1P
Senior Member
Senior Member
Posts: 233
Joined: 2005-02-28, 18:29 UTC
Location: Moscow
Contact:

Unicode comments via [=tc.comment] displays empty symbol

Post by *D1P » 2016-09-30, 13:53 UTC

TC 9.0pb16 (also reproduced in pb15 at least).

At first create any custom columns view with [=tc.comment] field contents, then switch to this view.
Choose Descript.ion Unicode UTF8 preferred type for file comments in configuration options.
Add any description to any file within a directory, that didn't contain any descriptions yet.
Comment will contain BOM marker, that TC displays as empty symbol: screenshot

This behavior isn't reproduced with standard comments view.

User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 38145
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) » 2016-10-04, 17:04 UTC

Unfortunately I couldn't reproduce it, not even with Russian locale settings. I found a few other bugs with comments in Russian locale, though.

Does it help to press '7' in lister?
Author of Total Commander
http://www.ghisler.com

User avatar
D1P
Senior Member
Senior Member
Posts: 233
Joined: 2005-02-28, 18:29 UTC
Location: Moscow
Contact:

Post by *D1P » 2016-10-05, 14:33 UTC

Hi, Christian!

It seemed strange to me, because i can reproduce this quite easily. I just run TC with empty ini file, and repeat steps, that i told before. I tried with TCpb16 x32 & x64 on Win10 & Win7 with same result.
May be you need to change file list font to some, where unicode null symbol is not fully transparent (like @Arial Unicode MS)?

I also found other bugs with CommentPreferredFormat=3:
1. New comments sometimes saved without last symbol: e.g. "commen" instead of "comment" (but i don't know how to reproduce this).
2. Comment ending with new line (comment text+Enter+F2 in file comment dialog) displayed as "comment text\" (also in standard comments view).

See the screenshot.
Does it help to press '7' in lister?
Problem not in the lister, it work just as expected.

User avatar
MVV
Power Member
Power Member
Posts: 8395
Joined: 2008-08-03, 12:51 UTC
Location: Russian Federation

Post by *MVV » 2016-10-05, 17:12 UTC

Confirmed!


1. I deleted descript.ion, Ctrl+Z, typed this comment and pressed Enter (descript.ion.1 from archive):

Code: Select all

first test
I've lost last comment character, TC have created following descript.ion:

Code: Select all

п»ї
test.txt first tes

2. I deleted descript.ion, Ctrl+Z, typed this comment and pressed Enter (descript.ion.2 from archive):

Code: Select all

one
two
three
I've lost last comment character, TC have created following descript.ion:

Code: Select all

п»ї
test.txt one\ntwo\nthreГ‚
Note the three unexpected bytes 04 C3 82 at the end of comment!


3. If I switch to column set with [tc.comment], extra character is displayed in front of file comment, it is easy to see it using cm_CopyFileDetailsToClip command (test.txt.details.txt from archive):

Code: Select all

test.txt	one  two  thre
Note the unexpected byte 01 in front of comment!


All mentioned files are here:

Code: Select all

MIME-Version: 1.0
Content-Type: application/octet-stream; name="test.7z"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="test.7z"

N3q8ryccAAOS/Iqm1QAAAAAAAAAiAAAAAAAAAM2rGFUAOhlKzh2IoES7Lhxb/lHfCZL4+lOR+BZP
4rbN9jLByfL6Xqs3j+O4IbBDvBg6ivKVaa3HqXFYlr8pUgAAAIEzB64P0bZyEKCQoHew/pOH4awj
8Rsu0N/fIPka465fB11NgwLCGUqzGlc3KCY1SfvEmdOIdPCk2/2+nlQ1BtA+HXDVn+MLbAocezrF
UeijJWU/xch2ob83dZV/TFXr01M+E99tnRKQi/zmxOwX4yxmsQpsdNvJc5lEqaF/rMraVEwZt8Sw
IAfKg/+KQ9nagPBj2JZ1AAAXBjwBCYCZAAcLAQABIwMBAQVdABAAAAyA7QoBbaRtQAAA

User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 38145
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) » 2016-10-05, 19:33 UTC

The problems about lost last characters and last line break were the ones I found - they will be fixed in beta 17.

But I still can't reproduce your other problem. Maybe I missed a step or two. :(
Author of Total Commander
http://www.ghisler.com

User avatar
MVV
Power Member
Power Member
Posts: 8395
Joined: 2008-08-03, 12:51 UTC
Location: Russian Federation

Post by *MVV » 2016-10-06, 07:42 UTC

I confirm that last character bug is fixed in 9.0b17!

But unexpected bytes 04 C3 82 are still here in case of multiline comment, and unexpected byte 01 is still here if I save text copied by cm_CopyFileDetailsToClip to text file.

And I can confirm these bugs with clean INI also on English and German Windows 8, when I set encoding of comments to Unicode UTF8 and switch to custom column set with [tc.comment] column, so it is not locale-specific bug.

Exact steps:
1. I use portable copy of TC 9.0b17 32-bit and wincmd.ini with just UseIniInProgramDir=7.
2. I set encoding of comments to Unicode UTF8.
3. I switch to new custom column set with [tc.comment] column.
4. I create empty text file test.txt.
5. I set mentioned 3-line comment using Ctrl+Z.
6. I open descript.ion via F3 (I have to enable hidden/system files first). I see three mentioned extra bytes 04 C3 82.
7. I copy test.txt details via cm_CopyFileDetailsToClip and paste into new file 1.txt via Shift+F4 and Windows Notepad. I see mentioned extra byte 01.

User avatar
D1P
Senior Member
Senior Member
Posts: 233
Joined: 2005-02-28, 18:29 UTC
Location: Moscow
Contact:

Post by *D1P » 2016-10-06, 12:02 UTC

In addition to the above, this behaviour reproduced only with these following file comments preferred type options, and no other:
- Unicode UTF16
- Unicode UTF16 Mac
- Unicode UTF8

Also, if you delete BOM signature from UTF8 descript.ion file (via hex editor or just convert in any appropriate tool), then will be no empty chars in comments.

User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 38145
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) » 2016-10-07, 10:23 UTC

unexpected byte 01
I can confirm that, thanks. It's used internally to mark UTF-8 encoded comments. I will remove it.
unexpected bytes 04 C3 82
These are not a bug. They are an extension of the descript.ion format: Normally the descript.ion format does not support line breaks. I have therefore registered an extension with the creators of descript.ion which supports line breaks.

It works like this: descript.ion extensions are at the end of each the line, separated by character 04. My extension has code C2. If this extension 04 C2 is present, all character pairs \n will be replaced by <CR><LF> line breaks.

Now C2 is the code used in ANSI, OEM, and UTF-16 encodings. However, C2 is not a valid character code in UTF-8. Therefore I convert the UTF-16 code C2 to UTF-8, which gives the code pair C3 82.
Author of Total Commander
http://www.ghisler.com

User avatar
MVV
Power Member
Power Member
Posts: 8395
Joined: 2008-08-03, 12:51 UTC
Location: Russian Federation

Post by *MVV » 2016-10-07, 11:59 UTC

Ah, that's an interesting information, thanks. :)

User avatar
D1P
Senior Member
Senior Member
Posts: 233
Joined: 2005-02-28, 18:29 UTC
Location: Moscow
Contact:

Post by *D1P » 2016-10-13, 02:21 UTC

Seems to be fixed in pb1, thank you.

User avatar
MVV
Power Member
Power Member
Posts: 8395
Joined: 2008-08-03, 12:51 UTC
Location: Russian Federation

Post by *MVV » 2016-10-13, 07:29 UTC

Yeah, fixed!

Post Reply