the search returns unmatched characters

The behaviour described in the bug report is either by design, or would be far too complex/time-consuming to be changed

Moderators: Hacker, petermad, Stefan2, white

ter
Junior Member
Junior Member
Posts: 23
Joined: 2024-07-22, 16:33 UTC

the search returns unmatched characters

Post by *ter »

1. search for "MZ└" (4D 5A C0) in a text file in Lister
2. it triggers on text that contains "mzL" (6D 7A 4C)

Code: Select all

MIME-Version: 1.0
Content-Type: application/octet-stream; name="mzl.txt"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="mzl.txt"

TVrAIHNlYXJjaCBtZQ0KLg0KDQouDQptekwgbm8=
yefkov
Junior Member
Junior Member
Posts: 4
Joined: 2019-03-15, 17:41 UTC

Re: the search returns unmatched characters

Post by *yefkov »

it triggers on text that contains "mzL" (6D 7A 4C)
I couldn't reproduce (TC 11.03). Only the text "MZ└" was found.
UPD: You forgot to mention that you are using a "wrong" code page. "└" can be assumed to be from cp850. So I searched using this code page. If I switch to cp1250 I get the result you describe.
This behavior is probably caused by the conversion of the search text from Unicode (TC search text box) to cp1250.
I don't know what conversion function TC uses, but WideCharToMultiByte, for example, has the following hint:
This flag prevents the function from mapping characters to characters that appear similar but have very different semantics. In some cases, the semantic change can be extreme. For example, the symbol for "∞" (infinity) maps to 8 (eight) in some code pages.
https://learn.microsoft.com/en-us/windows/win32/api/stringapiset/nf-stringapiset-widechartomultibyte
Last edited by yefkov on 2024-07-22, 19:44 UTC, edited 2 times in total.
User avatar
Hacker
Moderator
Moderator
Posts: 13144
Joined: 2003-02-06, 14:56 UTC
Location: Bratislava, Slovakia

Re: the search returns unmatched characters

Post by *Hacker »

Not confirmed, either.

Roman
Mal angenommen, du drückst Strg+F, wählst die FTP-Verbindung (mit gespeichertem Passwort), klickst aber nicht auf Verbinden, sondern fällst tot um.
ter
Junior Member
Junior Member
Posts: 23
Joined: 2024-07-22, 16:33 UTC

Re: the search returns unmatched characters

Post by *ter »

My windows code page is cyrillic , and "└" can be seen in oem/dos codepage (hotkey S).
User avatar
petermad
Power Member
Power Member
Posts: 16099
Joined: 2003-02-05, 20:24 UTC
Location: Denmark
Contact:

Re: the search returns unmatched characters

Post by *petermad »

I assume this is about searching in Lister, not Searching for text in files with the Find Files dialog.

It seems to happen when Lister is in ANSI mode (hotkey A) and set to use one of the following codepages: 1250, 1251, 1253 and 1254 hence also if the "Encoding" is set to "As configured for current Font" when the sytem codepage for the font is one of these.

I think yefkov is right about https://learn.microsoft.com/en-us/windows/win32/api/stringapiset/nf-stringapiset-widechartomultibyte giving the explanation
License #524 (1994)
Danish Total Commander Translator
TC 11.55rc4 32+64bit on Win XP 32bit & Win 7, 8.1 & 10 (22H2) 64bit, 'Everything' 1.5.0.1393a
TC 3.60b4 on Android 6, 13, 14
TC Extended Menus | TC Languagebar | TC Dark Help | PHSM-Calendar
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 50824
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Re: the search returns unmatched characters

Post by *ghisler(Author) »

Not a bug. This happens when "└" isn't part of the current code page and you search in ANSI text. TC converts the search string from Unicode to ANSI, with MultiByteToWideChar, and this functions has the strange habit to convert characters to similar characters, e.g. "└" to "L" is there is no equivalent in the current code page.
Author of Total Commander
https://www.ghisler.com
ter
Junior Member
Junior Member
Posts: 23
Joined: 2024-07-22, 16:33 UTC

Re: the search returns unmatched characters

Post by *ter »

Why Find text in files warns about invalid characters in the encoding, but search in lister does not?

I'm searching not in ANSI text, but in ASCII text

Why in ASCII/dos codepage it searches mzL but not MZ└? Which one is "current" codepage? If it's displayed, then it's current.
User avatar
AntonyD
Power Member
Power Member
Posts: 1662
Joined: 2006-11-04, 15:30 UTC
Location: Russian Federation

Re: the search returns unmatched characters

Post by *AntonyD »

As I see it - IF I choose Lister's render Options as `ANSI (Window charset)` and set "Encoding" Codepage as ASCII/DOS (866) - then
I will be able to find exactly this `MZ└` and of course This phrase will be displayed on the screen in the correct way.
But IF I will set codepage to "As configured for current Font" or `ANSI local code page` - I will not be able to search this string.
Instead of it will find "mzL". And rendering of this phrase will be broken obviously. `MZА` will be rendered.
So far, this is adequately understood and accepted as a fact.

IF I will choose Lister's render Options as `ASCII (DOS charset)` This phrase will be displayed on the screen in the correct way.
BUT! so necessary "Encoding" menu item will be disabled - and WHICH indeed the codepage will be chosen for the visual
interpretation of this text - I will not know for sure. I will only guess that it should be of course 866 - but who knows?
So as I assume this - I also still think that the repeated search process should be successful.
BUT! Search process now will NOT find `MZ└` - instead of it will find "mzL". Despite the fact
that the screen IS displaying the correctly searched phrase `MZ└`... Strange logic....
#146217 personal license
User avatar
petermad
Power Member
Power Member
Posts: 16099
Joined: 2003-02-05, 20:24 UTC
Location: Denmark
Contact:

Re: the search returns unmatched characters

Post by *petermad »

and WHICH indeed the codepage will be chosen for the visual
interpretation of this text - I will not know for sure. I will only guess that it should be of course 866
I tested it a little

If you in In the "Find Files" dialog, only select ASCII charset (DOS) then codepage 850 (OEM) seems to be used for searching text in files - at least in Windows with Danish locale (Codepage 865 is the Danish/Norwegian DOS codepage)...
License #524 (1994)
Danish Total Commander Translator
TC 11.55rc4 32+64bit on Win XP 32bit & Win 7, 8.1 & 10 (22H2) 64bit, 'Everything' 1.5.0.1393a
TC 3.60b4 on Android 6, 13, 14
TC Extended Menus | TC Languagebar | TC Dark Help | PHSM-Calendar
User avatar
AntonyD
Power Member
Power Member
Posts: 1662
Joined: 2006-11-04, 15:30 UTC
Location: Russian Federation

Re: the search returns unmatched characters

Post by *AntonyD »

seems to be used for
This is the trouble - we seem to understand and seem to be right - but we only guess WHAT and HOW is happening at this moment.
ALTHOUGH, it would seem that the most logical thing in this case is to display information about the factors on the basis of which
the rendering and search processes are carried out.
That is in fact, just do not disable the Encoding menu item. After all, just for this option only "As configured for current Font"
should remain in this menu as a sub-item. It seems to me that this is how it works.
#146217 personal license
User avatar
petermad
Power Member
Power Member
Posts: 16099
Joined: 2003-02-05, 20:24 UTC
Location: Denmark
Contact:

Re: the search returns unmatched characters

Post by *petermad »

2AntonyD
I tested this way:
I made txt a file with these characters: µ°Õ - in ASCII mode (S) they are displayed as Á░ı in Lister
If I view the file in ANSI mode (A) in lister it is only when chosing codepage DOS-LATIN1 (850) that the characters are displayed as Á░ı
And it is only when I in "Find Files" use Á░ı in the "Find text" field with only "ASCII charset (DOS)" enabled, that the file is found - not it I use µ°Õ

That for me indicates that the ASCII mode in Lister is using codepage 850, and that the "ASCII charset (DOS)" option in "Find Files" also uses codepage 850.
License #524 (1994)
Danish Total Commander Translator
TC 11.55rc4 32+64bit on Win XP 32bit & Win 7, 8.1 & 10 (22H2) 64bit, 'Everything' 1.5.0.1393a
TC 3.60b4 on Android 6, 13, 14
TC Extended Menus | TC Languagebar | TC Dark Help | PHSM-Calendar
User avatar
AntonyD
Power Member
Power Member
Posts: 1662
Joined: 2006-11-04, 15:30 UTC
Location: Russian Federation

Re: the search returns unmatched characters

Post by *AntonyD »

2petermad
I fully agree that we/you, or anyone else, will be able to DEDUCE this information in a same logical way, which you described in your post,
by doing such simple things as you said...
BUT the essence of the problem is not whether all these people know WHAT steps to perform (and you seem to have described them here
in the form of a help paragraph), but that when using Lister, there should not be no such situation/behavior when it is necessary to GUESS:
on the basis of what data(codepages, fonts - What else affects this process, by the way?) the search is carried out.
For that matter, there is still no option in the search dialog to choose `in which code page` data is provided by user for the searching.
In order to be able to make a more correct interpretation of the data bits from the search string from an input field.
#146217 personal license
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 50824
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Re: the search returns unmatched characters

Post by *ghisler(Author) »

Moderator message

Moved to will not be changed
Author of Total Commander
https://www.ghisler.com
ter
Junior Member
Junior Member
Posts: 23
Joined: 2024-07-22, 16:33 UTC

Re: the search returns unmatched characters

Post by *ter »

why not changed? in other editors, it does work.
why use WC_NO_BEST_FIT_CHARS? no explanation.
why not warn on different codepage? no explanation.
JOUBE
Power Member
Power Member
Posts: 1685
Joined: 2004-07-08, 08:58 UTC

Re: the search returns unmatched characters

Post by *JOUBE »

ter wrote: 2024-08-08, 12:18 UTCin other editors, it does work.
So, simply use these instead.

HTH

Joube
Post Reply