Option to generate checksum files as UTF-8 without BOM

Here you can propose new features, make suggestions etc.

Moderators: white, Hacker, petermad, Stefan2

Post Reply
JardaSX
Junior Member
Junior Member
Posts: 27
Joined: 2020-04-04, 23:27 UTC

Option to generate checksum files as UTF-8 without BOM

Post by *JardaSX »

I've noticed that, even if it's selected "Unix format: line breaks, '/' in paths", it will still use UTF-8-BOM as file encoding, which doesn't work with "md5sum -c sums.md5". TC only modifies the EOL but not the other, and both are required. Would like to have an additional option for that, so i don't have to use later dos2unix or Notepad++ on checksum files to modify the encoding.
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48083
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Re: Option to generate checksum files as UTF-8 without BOM

Post by *ghisler(Author) »

I cannot reproduce that, sorry.
Author of Total Commander
https://www.ghisler.com
JardaSX
Junior Member
Junior Member
Posts: 27
Joined: 2020-04-04, 23:27 UTC

Re: Option to generate checksum files as UTF-8 without BOM

Post by *JardaSX »

ghisler(Author) wrote: 2020-04-06, 15:11 UTC I cannot reproduce that, sorry.
Here is how to reproduce it:
Select any file -> Files -> Create checksum file -> save in unix format -> checksum.md5 -> ok
Now open checksum.md5 with Notepad++ -> go to Encoding -> it displays UTF-8-BOM, which is incompatible with md5sum for linux.

You can check it with GNU Bash also:

Code: Select all

$(cat checksum.md5 | head -c3 | grep -q $'\xef\xbb\xbf') && echo yes || echo no
User avatar
Horst.Epp
Power Member
Power Member
Posts: 6489
Joined: 2003-02-06, 17:36 UTC
Location: Germany

Re: Option to generate checksum files as UTF-8 without BOM

Post by *Horst.Epp »

JardaSX wrote: 2020-04-07, 15:11 UTC
ghisler(Author) wrote: 2020-04-06, 15:11 UTC I cannot reproduce that, sorry.
Here is how to reproduce it:
Select any file -> Files -> Create checksum file -> save in unix format -> checksum.md5 -> ok
Now open checksum.md5 with Notepad++ -> go to Encoding -> it displays UTF-8-BOM, which is incompatible with md5sum for linux.

You can check it with GNU Bash also:

Code: Select all

$(cat checksum.md5 | head -c3 | grep -q $'\xef\xbb\xbf') && echo yes || echo no
No confirmed.
The created md5 file is definitely not UTF-8 and has no BOM.
Its a standard Unix file.
There is no special tool necessary to check file format,
any good editor shows it and also allows to see it in Hex.
Windows 11 Home x64 Version 23H2 (OS Build 22631.3447)
TC 11.03 x64 / x86
Everything 1.5.0.1372a (x64), Everything Toolbar 1.3.3, Listary Pro 6.3.0.73
QAP 11.6.3.2 x64
User avatar
Dalai
Power Member
Power Member
Posts: 9388
Joined: 2005-01-28, 22:17 UTC
Location: Meiningen (Südthüringen)

Re: Option to generate checksum files as UTF-8 without BOM

Post by *Dalai »

I can only reproduce that in TC 9.50: If the characters in the filenames are outside of the current (ANSI) codepage, TC apparently adds a UTF-8 BOM to the checksum file. TC 9.51 doesn't add a BOM, and it doesn't even add it when the option "Always use UTF-8 in names" is also selected.

That is apparently the relevant change:
TC's history.txt wrote:12.02.20 Release Total Commander 9.50a release candidate 1 (RC1)
[...]
09.02.20 Fixed: Create CRC checksums: Do not add UTF-8 byte order marker to beginning of checksum file when using "Unix format" (32/64)
[...]
05.02.20 Release Total Commander 9.50 final (32/64)
2JardaSX
Which TC version exactly did you test with? Please also keep in mind that the filenames and the current OS codepage may be important for such tests as well.

Regards
Dalai
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64

Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
JardaSX
Junior Member
Junior Member
Posts: 27
Joined: 2020-04-04, 23:27 UTC

Re: Option to generate checksum files as UTF-8 without BOM

Post by *JardaSX »

Ok I've detected where the problem is. I't caused by Total Commander 9.50 (2020-02-05), with both TOTALCMD.EXE and TOTALCMD64.EXE. However I've been using TC since a long time, and probably it was some incremental update which caused the issue. Updating again to latest versoin without a clean install fixed the issue, with 9.51 it generates checksums with UTF-8 no BOM. I still have the backup of the C:\totalcmd\ in case the developer wants to take a look at it.
JardaSX
Junior Member
Junior Member
Posts: 27
Joined: 2020-04-04, 23:27 UTC

Re: Option to generate checksum files as UTF-8 without BOM

Post by *JardaSX »

Ok I have even more information. Performed a clean install of Total Commander 9.50. The issue is caused by selecting "Always use UTF-8 in names", regardless of the value of "Linux format: line breaks, '/' in paths".

Updating as update (no clean install) with tcmd951x32_64.exe fixes that issue. Obviously that's a bug which has been fixed, intentionally or not.

In any case, I find odd that the Microsoft documentation establishes that UTF-8 uses byte order mark "EF BB BF", so they imply that UTF-8 is just UTF-8-BOM for them (without having to make it explicit): Using Byte Order Marks

Those who have answered please re-test if you can reproduce the issue.
Post Reply