[9.2x] Wrong Charater Display in ZIP file

Please report only one bug per message!

Moderators: white, Hacker, petermad, Stefan2

Post Reply
thomasmo
Junior Member
Junior Member
Posts: 81
Joined: 2013-11-04, 01:31 UTC

[9.2x] Wrong Charater Display in ZIP file

Post by *thomasmo »

https://imgur.com/a/XY0MaMf

When enter the ZIP file what packer in linux(or ?), the file name would probably be shown incorrect like the right pic.

the WRONG file name ZIP is below

https://drive.google.com/open?id=1E113PIdKbe_j-BBcAHB2ofWLeviMAojC

Drag it within TC to extract the folder, the file name would be shown INcorrect.

but if use 7z or WinRAR extract it, the file name would be shown correct.

Use TC pack it to ZIP (internal infoZIP), the file name still be shown correct.

the WRIGHT file name ZIP is below

https://drive.google.com/open?id=1UZEKlI6pjI4bM_fqO2Le50J7bva4oHb0
User avatar
Usher
Power Member
Power Member
Posts: 1675
Joined: 2011-03-11, 10:11 UTC

Re: [9.2x] Wrong Charater Display in ZIP file

Post by *Usher »

Did you open your zip files in your browser? Google Drive displays right characters in "charater wrong.zip" and incorrect characters in "charater right.zip". To be exact, it displays the following character:
U+FFFD, REPLACEMENT CHARACTER, �: Replaces an invalid or unrecognizable character. Indicates a Unicode error.
As you can see, Google Drive uses Unicode UTF-8 encoding, which is now default for the Internet. It means that the original file names on websites are saved as UTF-8, and you can easy check that the text in those files is also encoded in UTF-8.

So why do you insist to ignore UTF-8 and force your local encoding? Why do you create another topic about the same problem?

If you save filenames using your local encoding, they will be properly displayed ONLY locally, on systems with the same encoding as yours set as default.

If you want to keep non-Latin characters untouched and properly seen all over the world you MUST use Unicode UTF-8 to save filenames in zip and to see filenames in any software dealing with your files. You MUST pack Unicode file names, and TC can do it. But automatic encoding recognition doesn't work properly in some cases, as @ghisler(Author) has already explained, so if you aren't sure it's all OK, you can change TC settings for zip-packer to always ask about Unicode.

What about other software?
* You should not use old software, which doesn't support Unicode. It will always display mojibake.
* Some software may be Unicode-aware, but still use local encoding as default. You should change its configuration if possible or remember to use proper settings (command line switches) when needed.
* The best choice is to use modern archive formats, update your software to be fully Unicode aware or look for other Unicode-aware tools.

Example: 7-zip
* 7-zip uses Unicode for 7-zip format by default, the same for other modern formats when it applies.
* For zip format by default (if cl and cu switches are not specified), 7-Zip uses UTF-8 encoding only for file names that contain symbols unsupported by local code page. It means that you should write `cu` in `Parameters` field of "Add to archive" dialog window when you create zip archive and want to always save Unicode filenames. Using command line tool you must specify -mcu switch.
* For list files, 7-Zip uses UTF-8 encoding by default. You can change encoding using -scs switch.
* When using SFX modules for installers, the config file must be written in UTF-8 encoding.
Andrzej P. Wozniak
Polish subforum moderator
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48079
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Re: [9.2x] Wrong Charater Display in ZIP file

Post by *ghisler(Author) »

The problem with your archive is that it uses UTF-8 for names, but does not set the UTF-8 bit. TC's unpacker does automatic UTF-8 detection, but it is disabled when the user uses a DBCs (double byte) encoding. Why? In this case, there are names which are valid both in UTF-8 and DBCs encoding.
Author of Total Commander
https://www.ghisler.com
thomasmo
Junior Member
Junior Member
Posts: 81
Joined: 2013-11-04, 01:31 UTC

Re: [9.2x] Wrong Charater Display in ZIP file

Post by *thomasmo »

but the 7z or WinRAR can open it correct, I wish the TC could scan it right
User avatar
DrShark
Power Member
Power Member
Posts: 1872
Joined: 2006-11-03, 22:26 UTC
Location: Kyiv, 68/262
Contact:

Re: [9.2x] Wrong Charater Display in ZIP file

Post by *DrShark »

BTW, there also was a suggestion to (temporary) set font charset for filelist in panel. If implemented, it could help in cases like this when auto detection fails.
Donate for Ukraine to help stop Russian invasion!
Ukraine's National Bank special bank account:
UA843000010000000047330992708
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48079
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Re: [9.2x] Wrong Charater Display in ZIP file

Post by *ghisler(Author) »

2thomasmo
These programs seem to ignore the current system encoding and just make an UTF-8 detection, ignoring the case where the names are also valid DBCs names. This way these invalid ZIP files will be detected, but at the same time, valid ZIP files with DBCs characters will be incorrectly shown. I prefer to show those files correctly which accually use correct ZIP format.

2DrShark
Changing the encoding manually would indeed help, but I couldn't find a good solution WHERE to show the encoding and let the user change it. :(
Author of Total Commander
https://www.ghisler.com
User avatar
DrShark
Power Member
Power Member
Posts: 1872
Joined: 2006-11-03, 22:26 UTC
Location: Kyiv, 68/262
Contact:

Re: [9.2x] Wrong Charater Display in ZIP file

Post by *DrShark »

ghisler(Author) wrote: 2019-03-11, 11:45 UTCI couldn't find a good solution WHERE to show the encoding and let the user change it. :(
I posted new possible UI solution in suggestion topic.
Donate for Ukraine to help stop Russian invasion!
Ukraine's National Bank special bank account:
UA843000010000000047330992708
Post Reply