Req: text search both normal and unicode
Moderators: Hacker, petermad, Stefan2, white
Peter K: Autodetection won't help you if you (knowingly) search for a text in a completely different format (e.g. you are looking for a text in a folder of EXE files and you want to know which one contains it).
d: Well, there are algorithms for searching multiple strings in parallel. I'm not sure if applicable to regular expressions, but an "ordinary" search for ANSI/UTF-8/Unicode string (or any other precise block of data) can certainly be done in one pass, if needed.
d: Well, there are algorithms for searching multiple strings in parallel. I'm not sure if applicable to regular expressions, but an "ordinary" search for ANSI/UTF-8/Unicode string (or any other precise block of data) can certainly be done in one pass, if needed.
This cases should be handled by fulltext content plug-in fields which know about their handled file format.completely different format
I made a suggestion how to simplify their use:
http://www.ghisler.ch/wiki/index.php/Integrate_fulltext_content_plugin_fields_into_find_text
It's a rare situation. How often do you search in exe files? And what text are you looking for in them?Autodetection won't help you if you (knowingly) search for a text in a completely different format
A content plug-in for exe may provide faster search. For example, it makes no sense to search for text in machine code (in code segment of exe file).
Searching in parallel is a good idea, too. Anyway, it would be better than the current interface where you have to remember the encoding of the file (e.g., was it UTF-8? or WinLatin-1?).
Personally - pretty often. Though yes, I admit it's probably not a very common case for most of the users.Peter K wrote:It's a rare situation. How often do you search in exe files? And what text are you looking for in them?Autodetection won't help you if you (knowingly) search for a text in a completely different format
What text? Well, it depends... e.g. some kind of a message that some program (or Windows) gave me and I want to know what module it's connected with, to find out more about it?
Not true - some compilers (Delphi, for example) store the strings inside of the code sections, right before the function they are used it. So yes, you have to search the code section as well. Besides, I didn't say I know that the string is stored in an executable in advance... could be a text file in the program folder, or something else...Peter K wrote:A content plug-in for exe may provide faster search. For example, it makes no sense to search for text in machine code (in code segment of exe file).
What I am not so sure about (regarding the parallel search algorithm) is the speed. The algorithm is certainly more complex... advantageous if you are searching for many strings simultaneously, but for small number of strings to search for (like here... where it would be something like 3), it might actually be slower than ordinary single-string search, repeated multiple times... it would have to be tested, I guess.Peter K wrote:Searching in parallel is a good idea, too. Anyway, it would be better than the current interface where you have to remember the encoding of the file (e.g., was it UTF-8? or WinLatin-1?).
Re: If possible---
That's a great idea! The implementation of a checkbox 'Keep search results' would serve not only the purpose of searching both Unicode and non-Unicode files, but would allow also to build a result set being an union of many separate searches of any kind.Sheepdog wrote:I think TC should search internally 2 times but remember the result of the first search and present both.
sheepdog
Then we could launch the first search, check the option 'Keep search results' and use the New search button to perform the second search, while keeping previously found results. The final set of files would contain the files meeting either the first OR the second search criteria.
What do you think about this proposal?
Flint's Homepage: Full TC Russification Package, VirtualDisk, NTFS Links, NoClose Replacer, and other stuff!
Using TC 11.03 / Win10 x64
Using TC 11.03 / Win10 x64
I too would like the option of submitting one search, and automatically finding a string regardless of how it is encoded. Currently, to be sure that I haven't missed any encodings, I have to perform multiple searches by hand, using computer time plus operator time and effort, and possibly missing an encoding. Even if the search simply would read a file multiple times, the file would be in memory after the first read. Therefore, if TC offered the option of automatic use of all encodings, I could save time.
I search in files of all types.
I search in files of all types.
- ghisler(Author)
- Site Admin
- Posts: 50843
- Joined: 2003-02-04, 09:46 UTC
- Location: Switzerland
- Contact:
Good news: The next version will support the following text searches at the same time:
ANSI
ASCII
Unicode UTF-16
Unicode UTF-8
Office-XML and EPUB (XML/HTML within ZIP)
ANSI
ASCII
Unicode UTF-16
Unicode UTF-8
Office-XML and EPUB (XML/HTML within ZIP)
Author of Total Commander
https://www.ghisler.com
https://www.ghisler.com
ghisler(Author)
Great! Can't wait to test it.
Great! Can't wait to test it.

Flint's Homepage: Full TC Russification Package, VirtualDisk, NTFS Links, NoClose Replacer, and other stuff!
Using TC 11.03 / Win10 x64
Using TC 11.03 / Win10 x64