Plain text search fails but RegEx succeeds

Please report only one bug per message!

Moderators: sheep, Hacker, Stefan2, white

User avatar
Horst.Epp
Power Member
Power Member
Posts: 3471
Joined: 2003-02-06, 17:36 UTC
Location: Germany

Re: Plain text search fails but RegEx succeeds

Post by *Horst.Epp » 2019-01-14, 09:27 UTC

MarkFilipak wrote:
2019-01-13, 21:08 UTC
As a topic for general discussion... (I think poking brains is fun)

In the world of GUI, why are we still dragging around '\n' & '\t'? Why can't we simply feed text -- any text -- into a text-box and click "Find"? By "any text" I include new-lines and tabs and control chars and... anything. The current search input methods are CLI relics that can be abandoned.

So, what would submit the search string? Not '\n' -- that's so 'CLI'. What would submit the search string would be [ Find ].
I don't agree with this general assumption.
That has nothing to do with CLI or any other environment conditions.
Its a major difference if I search for text inside of lines or accross line boundaries.
Nevertheless it would be helpful to have a search mode regeradless of any special chars.
Windows 10 Home x64 November 2019 Update, Version 1909 (OS Build 18363.476)
Intel(R) Core(TM) i7-4770 CPU @ 3.40GH, 16GB RAM
TC 9.50ß8 x64 / x86, Everything 1.4.1.958 (x64)

User avatar
Usher
Power Member
Power Member
Posts: 623
Joined: 2011-03-11, 10:11 UTC

Re: Plain text search fails but RegEx succeeds

Post by *Usher » 2019-01-14, 15:01 UTC

Horst.Epp wrote:
2019-01-14, 09:27 UTC
Nevertheless it would be helpful to have a search mode regardless of any special chars.
In general you are right, it would be something more like a smart web search. For the start I would like to see the following features:
  • fold all white space characters (replace multiple spaces, tabs and EOLs with a single space) to eliminate regex syntax;
  • ignore accents, umlauts, ogonki etc.;
  • ignore punctuation marks.
There are many kinds of special chars, so they should be grouped somehow (by Unicode range?), and I have no idea which group should be ignored in the first place (emoji?).

Finally, it's almost impossible to provide really smart search - because of (backward) compatibility issues there may be problems with duplicates, homographs, some accented characters, other special characters etc. in Unicode, see Wikipedia articles:
https://en.wikipedia.org/wiki/Duplicate_characters_in_Unicode
https://en.wikipedia.org/wiki/Unicode_equivalence
https://en.wikipedia.org/wiki/Unicode_compatibility_characters
https://en.wikipedia.org/wiki/Homoglyph
https://en.wikipedia.org/wiki/IDN_homograph_attack
Regards from Poland
Andrzej P. Wozniak

User avatar
MarkFilipak
Member
Member
Posts: 106
Joined: 2008-09-28, 01:00 UTC
Location: Mansfield, Ohio

Re: Plain text search fails but RegEx succeeds

Post by *MarkFilipak » 2019-01-14, 17:36 UTC

Hi Horst! Thanks for participating.
Horst.Epp wrote:
2019-01-14, 09:27 UTC
MarkFilipak wrote:
2019-01-13, 21:08 UTC
As a topic for general discussion... (I think poking brains is fun)

In the world of GUI, why are we still dragging around '\n' & '\t'? Why can't we simply feed text -- any text -- into a text-box and click "Find"? By "any text" I include new-lines and tabs and control chars and... anything. The current search input methods are CLI relics that can be abandoned.

So, what would submit the search string? Not '\n' -- that's so 'CLI'. What would submit the search string would be [ Find ].
I don't agree with this general assumption.
That has nothing to do with CLI or any other environment conditions. ...
Oh? Doesn't it? Well, let me ask: Why does '\n' even exist? I think it's because '\n' is the only way to include end-of-line in a search string. And that's only because, for a command line, an actual end-of-line ('Enter' key) terminates the command line! But what if we stop insisting that CLI commands consist solely of printable characters? What if we include non-printable characters?

What about this: Type in the first 'line' of target text but for end-of-line, press Ctrl+Enter; then continue typing in the next 'line' of target text, etc. When all the 'lines' of target text have been entered, then press Enter (without Ctrl) to terminate the command line. IMHO, that's the way Perl RegEx should have been from the git-go.

Next conceptual step: Type in the first 'line' of target text including Enter; then continue typing in the next 'line' of target text, etc. -- just as you would enter text into a text editor. When all the 'lines' of target text have been entered, then press the [Find] button to submit the search command.

'\n' to insert end-of-line into a search string is a relic of a bygone era.
Hi Christian! Delighted customer since 1999. License #37627

Post Reply