Plain text search fails but RegEx succeeds

Please report only one bug per message!

Moderators: white, Hacker, petermad, Stefan2

User avatar
Horst.Epp
Power Member
Power Member
Posts: 6429
Joined: 2003-02-06, 17:36 UTC
Location: Germany

Re: Plain text search fails but RegEx succeeds

Post by *Horst.Epp »

MarkFilipak wrote: 2019-01-13, 21:08 UTC As a topic for general discussion... (I think poking brains is fun)

In the world of GUI, why are we still dragging around '\n' & '\t'? Why can't we simply feed text -- any text -- into a text-box and click "Find"? By "any text" I include new-lines and tabs and control chars and... anything. The current search input methods are CLI relics that can be abandoned.

So, what would submit the search string? Not '\n' -- that's so 'CLI'. What would submit the search string would be [ Find ].
I don't agree with this general assumption.
That has nothing to do with CLI or any other environment conditions.
Its a major difference if I search for text inside of lines or accross line boundaries.
Nevertheless it would be helpful to have a search mode regeradless of any special chars.
Windows 11 Home x64 Version 23H2 (OS Build 22631.3296)
TC 11.03 x64 / x86
Everything 1.5.0.1371a (x64), Everything Toolbar 1.3.2, Listary Pro 6.3.0.69
QAP 11.6.3.2 x64
User avatar
Usher
Power Member
Power Member
Posts: 1675
Joined: 2011-03-11, 10:11 UTC

Re: Plain text search fails but RegEx succeeds

Post by *Usher »

Horst.Epp wrote: 2019-01-14, 09:27 UTCNevertheless it would be helpful to have a search mode regardless of any special chars.
In general you are right, it would be something more like a smart web search. For the start I would like to see the following features:
  • fold all white space characters (replace multiple spaces, tabs and EOLs with a single space) to eliminate regex syntax;
  • ignore accents, umlauts, ogonki etc.;
  • ignore punctuation marks.
There are many kinds of special chars, so they should be grouped somehow (by Unicode range?), and I have no idea which group should be ignored in the first place (emoji?).

Finally, it's almost impossible to provide really smart search - because of (backward) compatibility issues there may be problems with duplicates, homographs, some accented characters, other special characters etc. in Unicode, see Wikipedia articles:
https://en.wikipedia.org/wiki/Duplicate_characters_in_Unicode
https://en.wikipedia.org/wiki/Unicode_equivalence
https://en.wikipedia.org/wiki/Unicode_compatibility_characters
https://en.wikipedia.org/wiki/Homoglyph
https://en.wikipedia.org/wiki/IDN_homograph_attack
Andrzej P. Wozniak
Polish subforum moderator
User avatar
MarkFilipak
Member
Member
Posts: 164
Joined: 2008-09-28, 01:00 UTC
Location: Mansfield, Ohio

Re: Plain text search fails but RegEx succeeds

Post by *MarkFilipak »

Hi Horst! Thanks for participating.
Horst.Epp wrote: 2019-01-14, 09:27 UTC
MarkFilipak wrote: 2019-01-13, 21:08 UTC As a topic for general discussion... (I think poking brains is fun)

In the world of GUI, why are we still dragging around '\n' & '\t'? Why can't we simply feed text -- any text -- into a text-box and click "Find"? By "any text" I include new-lines and tabs and control chars and... anything. The current search input methods are CLI relics that can be abandoned.

So, what would submit the search string? Not '\n' -- that's so 'CLI'. What would submit the search string would be [ Find ].
I don't agree with this general assumption.
That has nothing to do with CLI or any other environment conditions. ...
Oh? Doesn't it? Well, let me ask: Why does '\n' even exist? I think it's because '\n' is the only way to include end-of-line in a search string. And that's only because, for a command line, an actual end-of-line ('Enter' key) terminates the command line! But what if we stop insisting that CLI commands consist solely of printable characters? What if we include non-printable characters?

What about this: Type in the first 'line' of target text but for end-of-line, press Ctrl+Enter; then continue typing in the next 'line' of target text, etc. When all the 'lines' of target text have been entered, then press Enter (without Ctrl) to terminate the command line. IMHO, that's the way Perl RegEx should have been from the git-go.

Next conceptual step: Type in the first 'line' of target text including Enter; then continue typing in the next 'line' of target text, etc. -- just as you would enter text into a text editor. When all the 'lines' of target text have been entered, then press the [Find] button to submit the search command.

'\n' to insert end-of-line into a search string is a relic of a bygone era.
Hi Christian! Delighted customer since 1999. License #37627
Post Reply