Lister - Searching for hex string returns incorrect result

English support forum

Moderators: white, Hacker, petermad, Stefan2

Post Reply
stifani
Junior Member
Junior Member
Posts: 19
Joined: 2004-05-13, 02:32 UTC

Lister - Searching for hex string returns incorrect result

Post by *stifani »

Hello,

Using TC 9.21a in win7 x86, if I search for a hex string in some file (for instance 8C), when searching again for the same string, it also picks up 9C string. And in the same way, searching for 8C string gives also 9C string even if there are no 8C string present.

Image: https://i.ibb.co/8rGT0wr/wrongpick.png


Anyone can confirm ?
Thanks
User avatar
sqa_wizard
Power Member
Power Member
Posts: 3862
Joined: 2003-02-06, 11:41 UTC
Location: Germany

Re: Lister - Searching for hex string returns incorrect result

Post by *sqa_wizard »

In fact TC does not really search for 8C, but for the character represented by hex 8C.
This means it will find 8C as well as the lower case of 8C character which is 9C ;-)

With this knowledge you just have to enable option "Case sensitive" as well.
This will find 8C only.
#5767 Personal license
stifani
Junior Member
Junior Member
Posts: 19
Joined: 2004-05-13, 02:32 UTC

Re: Lister - Searching for hex string returns incorrect result

Post by *stifani »

I thought that case sensitive worked only for normal character and not for hex string search.
In fact using case sensitive returns only the matched string.

Thanks :D
User avatar
Usher
Power Member
Power Member
Posts: 1675
Joined: 2011-03-11, 10:11 UTC

Re: Lister - Searching for hex string returns incorrect result

Post by *Usher »

TC by default uses "Hex or ANSI" search and you cannot turn off encoding for Hex search, you can only select other encoding(s).
It may be good for US-ASCII codes (less then 0x80), but it should not work that way for other codepages, otherwise it may give unpredictable results when searching in Unicode text.

Let's have a look at another code: 0xC0. In Windows-1250 (and ISO-8859-2) it's Ŕ - LATIN CAPITAL LETTER R WITH ACUTE. Small letter ŕ, LATIN SMALL LETTER R WITH ACUTE has code 0xE0. In Unicode they have different codepoints: U+0154 and U+0155, and UTF-8 representation is 0xC594 and 0xC595.

In Windows-1252 (and ISO-8859-1) 0xC0 stands for À, LATIN CAPITAL LETTER A WITH GRAVE and 0xEO is à LATIN SMALL LETTER A WITH GRAVE, they have the same numbers for Unicode codepoints: U+00C0 and U+00E0, as ISO-8859-1 codes were used for Unicode codepoints.

Now guess what TC finds in Unicode text? Nothing expected. It finds code for _underscore_, 0x5F, in both Unicode encodings, even if only one of "Unicode UTF-16" and "UTF-8" options is checked.
Andrzej P. Wozniak
Polish subforum moderator
Post Reply