Req: text search both normal and unicode

Peter K · Post by *Peter K » 2007-11-26, 01:56 UTC

It would much better to improve encoding autodection in TC, so that it would detect UTF-8/UTF-16 automatically. Then, no options in the search box will be needed.

gigaman · Post by *gigaman » 2007-11-26, 10:49 UTC

Peter K: Autodetection won't help you if you (knowingly) search for a text in a completely different format (e.g. you are looking for a text in a folder of EXE files and you want to know which one contains it).

d: Well, there are algorithms for searching multiple strings in parallel. I'm not sure if applicable to regular expressions, but an "ordinary" search for ANSI/UTF-8/Unicode string (or any other precise block of data) can certainly be done in one pass, if needed.

Sosna · Post by *Sosna » 2007-11-26, 13:32 UTC

support ++. i just wanted to write this request, and found it's already here!

I think Flint's idea is ok - to check encodings I want to check

m^2 · Post by *m^2 » 2007-11-26, 14:45 UTC

Support++

gigaman wrote:Peter K: Autodetection won't help you if you (knowingly) search for a text in a completely different format (e.g. you are looking for a text in a folder of EXE files and you want to know which one contains it).

Agree.

Lefteous · Post by *Lefteous » 2007-11-26, 14:49 UTC

completely different format

This cases should be handled by fulltext content plug-in fields which know about their handled file format.
I made a suggestion how to simplify their use:
http://www.ghisler.ch/wiki/index.php/Integrate_fulltext_content_plugin_fields_into_find_text

Peter K · Post by *Peter K » 2007-11-27, 01:34 UTC

Autodetection won't help you if you (knowingly) search for a text in a completely different format

It's a rare situation. How often do you search in exe files? And what text are you looking for in them?

A content plug-in for exe may provide faster search. For example, it makes no sense to search for text in machine code (in code segment of exe file).

Searching in parallel is a good idea, too. Anyway, it would be better than the current interface where you have to remember the encoding of the file (e.g., was it UTF-8? or WinLatin-1?).

gigaman · Post by *gigaman » 2007-11-27, 07:34 UTC

Peter K wrote:
Autodetection won't help you if you (knowingly) search for a text in a completely different format
It's a rare situation. How often do you search in exe files? And what text are you looking for in them?

Personally - pretty often. Though yes, I admit it's probably not a very common case for most of the users.
What text? Well, it depends... e.g. some kind of a message that some program (or Windows) gave me and I want to know what module it's connected with, to find out more about it?

Peter K wrote:A content plug-in for exe may provide faster search. For example, it makes no sense to search for text in machine code (in code segment of exe file).

Not true - some compilers (Delphi, for example) store the strings inside of the code sections, right before the function they are used it. So yes, you have to search the code section as well. Besides, I didn't say I know that the string is stored in an executable in advance... could be a text file in the program folder, or something else...

Peter K wrote:Searching in parallel is a good idea, too. Anyway, it would be better than the current interface where you have to remember the encoding of the file (e.g., was it UTF-8? or WinLatin-1?).

What I am not so sure about (regarding the parallel search algorithm) is the speed. The algorithm is certainly more complex... advantageous if you are searching for many strings simultaneously, but for small number of strings to search for (like here... where it would be something like 3), it might actually be slower than ordinary single-string search, repeated multiple times... it would have to be tested, I guess.

TLis · Post by *TLis » 2007-11-27, 10:26 UTC

Sheepdog wrote:I think TC should search internally 2 times but remember the result of the first search and present both.
sheepdog

That's a great idea! The implementation of a checkbox 'Keep search results' would serve not only the purpose of searching both Unicode and non-Unicode files, but would allow also to build a result set being an union of many separate searches of any kind.

Then we could launch the first search, check the option 'Keep search results' and use the New search button to perform the second search, while keeping previously found results. The final set of files would contain the files meeting either the first OR the second search criteria.

What do you think about this proposal?

m^2 · Post by *m^2 » 2007-11-27, 11:49 UTC

Peter K wrote:
Autodetection won't help you if you (knowingly) search for a text in a completely different format
It's a rare situation. How often do you search in exe files? And what text are you looking for in them?

Often.

Flint · Post by *Flint » 2013-01-18, 14:42 UTC

Bump.

Any chance to see this idea implemented in the future versions?

Ralph · Post by *Ralph » 2013-01-26, 01:26 UTC

I too would like the option of submitting one search, and automatically finding a string regardless of how it is encoded. Currently, to be sure that I haven't missed any encodings, I have to perform multiple searches by hand, using computer time plus operator time and effort, and possibly missing an encoding. Even if the search simply would read a file multiple times, the file would be in memory after the first read. Therefore, if TC offered the option of automatic use of all encodings, I could save time.

I search in files of all types.

Samuel · Post by *Samuel » 2013-01-26, 06:56 UTC

Support++;

It would be a very handy feature.

Post by *ghisler(Author) » 2013-01-28, 13:54 UTC

Good news: The next version will support the following text searches at the same time:
ANSI
ASCII
Unicode UTF-16
Unicode UTF-8
Office-XML and EPUB (XML/HTML within ZIP)

Flint · Post by *Flint » 2013-01-28, 14:53 UTC

ghisler(Author)
Great! Can't wait to test it.

Samuel · Post by *Samuel » 2013-01-28, 23:01 UTC

Great!

Total Commander

Req: text search both normal and unicode

Re: If possible---