Req: text search both normal and unicode

English support forum

Moderators: Hacker, petermad, Stefan2, white

Peter K
Junior Member
Junior Member
Posts: 6
Joined: 2007-11-25, 08:28 UTC

Post by *Peter K »

It would much better to improve encoding autodection in TC, so that it would detect UTF-8/UTF-16 automatically. Then, no options in the search box will be needed.
gigaman
Member
Member
Posts: 134
Joined: 2003-02-14, 11:28 UTC

Post by *gigaman »

Peter K: Autodetection won't help you if you (knowingly) search for a text in a completely different format (e.g. you are looking for a text in a folder of EXE files and you want to know which one contains it).

d: Well, there are algorithms for searching multiple strings in parallel. I'm not sure if applicable to regular expressions, but an "ordinary" search for ANSI/UTF-8/Unicode string (or any other precise block of data) can certainly be done in one pass, if needed.
User avatar
Sosna
Member
Member
Posts: 143
Joined: 2006-10-24, 10:52 UTC

Post by *Sosna »

support ++. i just wanted to write this request, and found it's already here!

I think Flint's idea is ok - to check encodings I want to check :)
Ave Caesar Imperator,
moritari te salutant!
User avatar
m^2
Power Member
Power Member
Posts: 1413
Joined: 2006-07-12, 10:02 UTC
Location: Poland
Contact:

Post by *m^2 »

Support++
gigaman wrote:Peter K: Autodetection won't help you if you (knowingly) search for a text in a completely different format (e.g. you are looking for a text in a folder of EXE files and you want to know which one contains it).
Agree.
User avatar
Lefteous
Power Member
Power Member
Posts: 9537
Joined: 2003-02-09, 01:18 UTC
Location: Germany
Contact:

Post by *Lefteous »

completely different format
This cases should be handled by fulltext content plug-in fields which know about their handled file format.
I made a suggestion how to simplify their use:
http://www.ghisler.ch/wiki/index.php/Integrate_fulltext_content_plugin_fields_into_find_text
Peter K
Junior Member
Junior Member
Posts: 6
Joined: 2007-11-25, 08:28 UTC

Post by *Peter K »

Autodetection won't help you if you (knowingly) search for a text in a completely different format
It's a rare situation. How often do you search in exe files? And what text are you looking for in them?

A content plug-in for exe may provide faster search. For example, it makes no sense to search for text in machine code (in code segment of exe file).

Searching in parallel is a good idea, too. Anyway, it would be better than the current interface where you have to remember the encoding of the file (e.g., was it UTF-8? or WinLatin-1?).
gigaman
Member
Member
Posts: 134
Joined: 2003-02-14, 11:28 UTC

Post by *gigaman »

Peter K wrote:
Autodetection won't help you if you (knowingly) search for a text in a completely different format
It's a rare situation. How often do you search in exe files? And what text are you looking for in them?
Personally - pretty often. Though yes, I admit it's probably not a very common case for most of the users.
What text? Well, it depends... e.g. some kind of a message that some program (or Windows) gave me and I want to know what module it's connected with, to find out more about it?
Peter K wrote:A content plug-in for exe may provide faster search. For example, it makes no sense to search for text in machine code (in code segment of exe file).
Not true - some compilers (Delphi, for example) store the strings inside of the code sections, right before the function they are used it. So yes, you have to search the code section as well. Besides, I didn't say I know that the string is stored in an executable in advance... could be a text file in the program folder, or something else...
Peter K wrote:Searching in parallel is a good idea, too. Anyway, it would be better than the current interface where you have to remember the encoding of the file (e.g., was it UTF-8? or WinLatin-1?).
What I am not so sure about (regarding the parallel search algorithm) is the speed. The algorithm is certainly more complex... advantageous if you are searching for many strings simultaneously, but for small number of strings to search for (like here... where it would be something like 3), it might actually be slower than ordinary single-string search, repeated multiple times... it would have to be tested, I guess.
User avatar
TLis
Member
Member
Posts: 111
Joined: 2004-06-02, 16:48 UTC
Location: Szczecin, Poland

Re: If possible---

Post by *TLis »

Sheepdog wrote:I think TC should search internally 2 times but remember the result of the first search and present both.
sheepdog
That's a great idea! The implementation of a checkbox 'Keep search results' would serve not only the purpose of searching both Unicode and non-Unicode files, but would allow also to build a result set being an union of many separate searches of any kind.

Then we could launch the first search, check the option 'Keep search results' and use the New search button to perform the second search, while keeping previously found results. The final set of files would contain the files meeting either the first OR the second search criteria.

What do you think about this proposal?
User avatar
m^2
Power Member
Power Member
Posts: 1413
Joined: 2006-07-12, 10:02 UTC
Location: Poland
Contact:

Post by *m^2 »

Peter K wrote:
Autodetection won't help you if you (knowingly) search for a text in a completely different format
It's a rare situation. How often do you search in exe files? And what text are you looking for in them?
Often.
User avatar
Flint
Power Member
Power Member
Posts: 3511
Joined: 2003-10-27, 09:25 UTC
Location: Belgrade, Serbia
Contact:

Post by *Flint »

Bump.

Any chance to see this idea implemented in the future versions?
Flint's Homepage: Full TC Russification Package, VirtualDisk, NTFS Links, NoClose Replacer, and other stuff!
 
Using TC 11.03 / Win10 x64
Ralph
Member
Member
Posts: 100
Joined: 2004-08-21, 14:58 UTC
Location: USA

Post by *Ralph »

I too would like the option of submitting one search, and automatically finding a string regardless of how it is encoded. Currently, to be sure that I haven't missed any encodings, I have to perform multiple searches by hand, using computer time plus operator time and effort, and possibly missing an encoding. Even if the search simply would read a file multiple times, the file would be in memory after the first read. Therefore, if TC offered the option of automatic use of all encodings, I could save time.

I search in files of all types.
User avatar
Samuel
Power Member
Power Member
Posts: 1930
Joined: 2003-08-29, 15:44 UTC
Location: Germany, Brandenburg an der Havel
Contact:

Post by *Samuel »

Support++;

It would be a very handy feature.
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 50844
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

Good news: The next version will support the following text searches at the same time:
ANSI
ASCII
Unicode UTF-16
Unicode UTF-8
Office-XML and EPUB (XML/HTML within ZIP)
Author of Total Commander
https://www.ghisler.com
User avatar
Flint
Power Member
Power Member
Posts: 3511
Joined: 2003-10-27, 09:25 UTC
Location: Belgrade, Serbia
Contact:

Post by *Flint »

ghisler(Author)
Great! Can't wait to test it. :D
Flint's Homepage: Full TC Russification Package, VirtualDisk, NTFS Links, NoClose Replacer, and other stuff!
 
Using TC 11.03 / Win10 x64
User avatar
Samuel
Power Member
Power Member
Posts: 1930
Joined: 2003-08-29, 15:44 UTC
Location: Germany, Brandenburg an der Havel
Contact:

Post by *Samuel »

Great!
Post Reply