Lister may incorrectly show UTF-16 HTML/XML as UTF-8 in menu

The behaviour described in the bug report is either by design, or would be far too complex/time-consuming to be changed

Moderators: Hacker, petermad, Stefan2, white

Post Reply
Slavic
Senior Member
Senior Member
Posts: 297
Joined: 2006-02-26, 15:41 UTC
Location: Montenegro

Lister may incorrectly show UTF-16 HTML/XML as UTF-8 in menu

Post by *Slavic »

Option to see HTML or XMl without tags is quite useful, Lister shows HTML files this way by default. However, for the files in UTF-16 encoding (rare for HTML but typical for XML) Lister shows them correctly, but in the Options menu they are marked as UTF-8, which is wrong. Apparently the menu items "HTML text (strip tags)" and "Unicode" cannot be marked at the same time. This problem happens with both HTML and XML files.

Save this example in Notepad as test16.htm with encoding UTF-16 LE:

Code: Select all

<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-16" />
</head>

<body>
<p>This is a test example in UTF-16 encoding.</p>
</body>
</html>
1. Lister opens this file in HTML mode (no tags), but shows UTF-8 in the Options menu.
2. Selection of Unicode, which means UTF-16, in the menu shows full text with all tags.
3. Selection of HTML shows the example without tags, but again as UTF-8 in the menu.

Suggestion: UTF-16 files should be shown as Unicode in the menu, with or without the HTML/XML tags.
Desktop: Windows 11 Pro 23H2, TC 11.55. Mobile: Pixel 5a, Android 14, TC 3.60b4
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 50817
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Re: Lister may incorrectly show UTF-16 HTML/XML as UTF-8 in menu

Post by *ghisler(Author) »

I can confirm this, but it's not actually a bug: What happens is that Lister first converts the file from html to plain text utf-8, and then uses the utf-8 viewer to display it. You can verify this by using some accented characters in your sample, e.g. in french:

Code: Select all

<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-16" />
</head>

<body>
<p>Ceci est un exemple de test codé en UTF-16.</p>
</body>
</html>
When you first view it in Lister, it shows up as
Ceci est un exemple de test codé en UTF-16.
with html and utf-8 checked. When you then click on "html" in the menu, it shows up as
Ceci est un exemple de test codé en UTF-16.
with utf-8 unchecked in the menu. As you can see, the é is now split up into the 2 codes making up the utf-8 character.

That's why I would prefer to keep utf-8 checked, and not "Unicode".
Author of Total Commander
https://www.ghisler.com
Slavic
Senior Member
Senior Member
Posts: 297
Joined: 2006-02-26, 15:41 UTC
Location: Montenegro

Re: Lister may incorrectly show UTF-16 HTML/XML as UTF-8 in menu

Post by *Slavic »

I understand this logic: in HTML mode Lister shows in the menu the encoding of internal conversion instead of original encoding. But this is not intuitively obvious and should be documented in some way. Currently this conversion is not explained and both HTML encodings are mentioned as equally possible:
5. HTML-Text (strip tags)
...
Also supports UTF-8- and UTF-16-encoded HTML, if the file contains the appropriate META HTTP-EQUIV header.
With some explanation, user will not be misguided which exactly encoding has a viewed document.
Desktop: Windows 11 Pro 23H2, TC 11.55. Mobile: Pixel 5a, Android 14, TC 3.60b4
browny
Senior Member
Senior Member
Posts: 370
Joined: 2007-09-10, 13:19 UTC

Re: Lister may incorrectly show UTF-16 HTML/XML as UTF-8 in menu

Post by *browny »

Tags are mostly expected to be in UTF-8 or ANSI, while UTF-16LE makes the file an odd creation.
Hard to find good logic in a broken file.
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 50817
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Re: Lister may incorrectly show UTF-16 HTML/XML as UTF-8 in menu

Post by *ghisler(Author) »

Fir the above reasons I will not change this, sorry.

Moderator message from: ghisler(Author) » 2025-06-02, 09:38 UTC

Moved to will not be changed
Author of Total Commander
https://www.ghisler.com
User avatar
AntonyD
Power Member
Power Member
Posts: 1660
Joined: 2006-11-04, 15:30 UTC
Location: Russian Federation

Re: Lister may incorrectly show UTF-16 HTML/XML as UTF-8 in menu

Post by *AntonyD »

2ghisler(Author)
Why not at least add a note to the menu item title that this is the encoding from an INTERNAL tool?
Do you really think a person can remember forever a fact that has just been revealed to the public?
And in a couple of months - a similar question won't come up again?
#146217 personal license
Post Reply