3.x+. Search text files by content: cannot find Cyrillic

Support for Android version of Total Commander

Moderators: white, Hacker, petermad, Stefan2

Post Reply
Skif_off
Member
Member
Posts: 132
Joined: 2013-09-30, 13:13 UTC

3.x+. Search text files by content: cannot find Cyrillic

Post by *Skif_off »

But in version 2.91 it works fine: I checked several files, which use UTF-8 without BOM (also HTML files with charset=utf-8) and cp1251.

Android 4.4, if it matters, Android 8 too.
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48021
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Re: 3.x+. Search text files by content: cannot find Cyrillic

Post by *ghisler(Author) »

Does the file contain any UTF-8 in the first 4 kBytes? If not, then TC only searches with the default encoding.
Author of Total Commander
https://www.ghisler.com
Skif_off
Member
Member
Posts: 132
Joined: 2013-09-30, 13:13 UTC

Re: 3.x+. Search text files by content: cannot find Cyrillic

Post by *Skif_off »

ghisler(Author) wrote: 2021-01-27, 14:56 UTCDoes the file contain any UTF-8 in the first 4 kBytes?
Yes.
The same files in the same directory and the same search word: 2.91 finds files, but 3.00-3.20 finds nothing. Looks like TC problem.
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48021
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Re: 3.x+. Search text files by content: cannot find Cyrillic

Post by *ghisler(Author) »

It works fine with my files, so I need a test file to check it. Please do the following:
1. Make a copy of one of your files
2. Edit it and remove any senstive data
3. Check whether the error still occurs
4. If yes, please send me the sample and your search word to cghisler at gmail dot com.
Author of Total Commander
https://www.ghisler.com
Skif_off
Member
Member
Posts: 132
Joined: 2013-09-30, 13:13 UTC

Re: 3.x+. Search text files by content: cannot find Cyrillic

Post by *Skif_off »

I did some more checks (2.91 vs. 3.20) and I think I found the problem. Try to find any nonUS-ASCII word, which have position with offset >4 kBytes: 3.20 tries to search only in the first 4 kBytes.
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48021
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Re: 3.x+. Search text files by content: cannot find Cyrillic

Post by *ghisler(Author) »

Yes, that's what I wrote above: If the first 4k do NOT contain any UTF-8 multi-byte characters at all, then TC assumes that the file is in ANSI format and not UTF-8. Maybe I should just search the buffer twice, once in ANSI and once in UTF-8 mode...
Author of Total Commander
https://www.ghisler.com
Skif_off
Member
Member
Posts: 132
Joined: 2013-09-30, 13:13 UTC

Re: 3.x+. Search text files by content: cannot find Cyrillic

Post by *Skif_off »

https://www.upload.ee/files/12836392/testutf8.zip.html
Both files are UTF-8 encoded (starting from the first bytes), both files contain the word "плагин", but
- utf8-1.txt - with offset 0x0463.
- utf8-2.txt - with offset 0x2245.
Try to find files with "плагин": TC 2.91 will find both files, but TC 3.20 only first file.

It looks like TC 3.x is looking for the desired text inside buffer (first 4k Bytes), but not in all file content.
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48021
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Re: 3.x+. Search text files by content: cannot find Cyrillic

Post by *ghisler(Author) »

Could you please try it with v3.21beta1? You can get the beta either from the Play Store:
https://play.google.com/apps/testing/com.ghisler.android.TotalCommander
or download it directly here:
https://www.ghisler.com/android.htm#download
Author of Total Commander
https://www.ghisler.com
Skif_off
Member
Member
Posts: 132
Joined: 2013-09-30, 13:13 UTC

Re: 3.x+. Search text files by content: cannot find Cyrillic

Post by *Skif_off »

Seems to work now, thank you!
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48021
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Re: 3.x+. Search text files by content: cannot find Cyrillic

Post by *ghisler(Author) »

Great, thanks for your quick reply!
Author of Total Commander
https://www.ghisler.com
Post Reply