utf-8 filenames in zip archive

Support for Android version of Total Commander

Moderators: white, Hacker, petermad, Stefan2

Post Reply
ccaid
Junior Member
Junior Member
Posts: 4
Joined: 2014-02-10, 15:05 UTC

utf-8 filenames in zip archive

Post by *ccaid »

TC automatically uses utf-8 code page for filenames when creates zip archives. but the automatic has shortcomings.
1) a lot of unicode characters can not be used without utf-8 code page, f.e. ellipsis U+2026. but TC still uses OEM code page for filenames with such characters. this lead to filenames distortion.
2) Russian named files zipped under Russian locale can not be correctly unzipped under other locales, f.e. under English (US) locale.
3) Russian named files can not be correctly zipped under non-Russian locales, f.e. under English (US) locale.

those problems could be prevented by option "Always use utf-8 for zipping" (or something like).
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48079
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

Thanks for your suggestion. The UTF-8 names could be stored in extra fields, which both TC for Windows and WinZIP can handle - don't know about other packers, though. TC for Windows already has this option.
Author of Total Commander
https://www.ghisler.com
ccaid
Junior Member
Junior Member
Posts: 4
Joined: 2014-02-10, 15:05 UTC

Post by *ccaid »

7zip is NOT handle extra field.
I have downloaded TC for Windows to try its zip packer. The best result is given by "All as UTF-8 if at least one contains characters>127". Zip archives made with this option successfully read in TC for Windows, 7zip, TC for Android under Russian and English locale.
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48079
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

Unfortunately using UTF-8 as the main name doesn't work with Windows Explorer. :(
Author of Total Commander
https://www.ghisler.com
ccaid
Junior Member
Junior Member
Posts: 4
Joined: 2014-02-10, 15:05 UTC

Post by *ccaid »

explorer can not handle unicode names at all as far as I understand. if one wish to store unicode names in archives and to use explorer's zip folder, he/she pedir imposibles.

and I have to repeat (just in case). some unicode chars are incorrectly transformed now (both in TC for Android and in TC for Windows with non-UTF options). f.e. ellipsis (…) is transformed to colon ( : ). names with colon are not permited for FAT/NTFS, so most packers can't unpack such files. and explorer even can't "see" such files in zip.
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48079
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

The advantage of Unicode names in extra fields is that the archive can still be handled by text-only unpackers. Of course the names will be handled correctly only when using the same encoding, but usually the PC and the phone of a specific user use the same encoding (e.g. Cyrillic for Russian users).
Author of Total Commander
https://www.ghisler.com
ccaid
Junior Member
Junior Member
Posts: 4
Joined: 2014-02-10, 15:05 UTC

Post by *ccaid »

Yes, I see. but… for text-only unpackers there is solution at the moment (I mean current behaviour of TC for Android),
while 7zip unpacker needs TC's option like "Add UTF-8 always".
Post Reply