[WCX] RegXtract - String Extractor with RegEx

Discuss and announce Total Commander plugins, addons and other useful tools here, both their usage and their development.

Moderators: white, Hacker, petermad, Stefan2

trevor12
Junior Member
Junior Member
Posts: 65
Joined: 2012-12-06, 15:16 UTC
Location: Czech republic

linkify

Post by *trevor12 »

thanks for solutions, I forgot say that in my file there are many urls strings but not only urls, it is long text with words and among other there are many urls

I will try your solutions
User avatar
Peter
Power Member
Power Member
Posts: 2064
Joined: 2003-11-13, 13:40 UTC
Location: Schweiz

Post by *Peter »

Hi

I need a solution for this job:

- use the (text-)files that are selected in TC
- test if in the 5. line there is the string "hello"
- if string is found, write lines 1 - 6 to "result.txt" (append the results)

Can it be done with this Plugin?

Thanks and regards

Peter
TC 10.xx / #266191
Win 10 x64
User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 »

Peter wrote:- use the (text-)files that are selected in TC
- test if in the 5. line there is the string "hello"
- if string is found, write lines 1 - 6 to "result.txt" (append the results)
I think this would work:

Code: Select all

^(.*\R?)(?1){3}(.*hello.*\R?).*

Replace:
$0
Use the default options, and remember to use enough read memory, in case your text files exceed the default 10 MiB.


If you want to match "hello" case sensitive, either use

Code: Select all

(?-i)^(.*\R?)(?1){3}(.*hello.*\R?).*
or check the "Case sensitive" option.


If your input files have mixed line endings (Unix LF, Windows CRLF, Mac CR)
you probably better use:

Code: Select all

^(.*)\R?(.*)\R?(.*)\R?(.*)\R?(.*hello.*)\R?(.*)

Replace:
$1
$2
$3
$4
$5
$6
To prevent mixed style line endings in the output file.
TC plugins: PCREsearch and RegXtract
User avatar
Peter
Power Member
Power Member
Posts: 2064
Joined: 2003-11-13, 13:40 UTC
Location: Schweiz

Post by *Peter »

Thanks @milo1012
looks great :D
TC 10.xx / #266191
Win 10 x64
User avatar
Peter
Power Member
Power Member
Posts: 2064
Joined: 2003-11-13, 13:40 UTC
Location: Schweiz

Re: [WCX] RegXtract - String Extractor with RegEx

Post by *Peter »

Sorry - me again with a question for beginners.

I have tons of text-files like this

...lots of different text...
Signature\n -> first fixed text plus a blank at the end
\n -> second fixed text plus a blank at the end
(vlr-reactions reactor) -> variable Text which should be extracted
\n -> third fixed text plus a blank at the end
... continuing tons of text ...


Sorry - I have to modify it. I used a viewer which did not show the real context. Here now an example what I want to extract in red. Attention: For example behind \n there are many blanks - but the forum reduces them to one when displaying ..


lots of different text ...Signature\n </h2> \n <div class=\"codeBlock\"><pre>(strcat <em class=\"codeEmphasisMild\">[string string_n ...]</em>)</pre> .. lots of text


(Or - as challenge also remove the tags:

lots of different text ...Signature\n </h2> \n <div class=\"codeBlock\"><pre>(strcat <em class=\"codeEmphasisMild\">[string string_n ...]</em>)</pre> .. lots of text



I'm sure it should be simple - but I don't see the solution. Who can help?

Thanks
TC 10.xx / #266191
Win 10 x64
User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Re: [WCX] RegXtract - String Extractor with RegEx

Post by *milo1012 »

2Peter

Well, it's not THAT easy.
A possible expression would be:

Code: Select all

<div[^>]*><pre[^>]*>\((.*)\s<em\s[^>]*>(.*)</em>\)</pre>
The first group would hold the "strcat" part, the 2nd the actual string array.
So a replacement string could be sth. like:

Code: Select all

$1: $2
You should carefully test this expression, as it might not work for all cases, since I'm not completely sure about the content of your input files (it seems to contain some wild mixture of HTML code and non-HTML content).
TC plugins: PCREsearch and RegXtract
User avatar
Peter
Power Member
Power Member
Posts: 2064
Joined: 2003-11-13, 13:40 UTC
Location: Schweiz

Re: [WCX] RegXtract - String Extractor with RegEx

Post by *Peter »

Milo, thank you - works fine for my needs.

Some strings are not extracted correctly, for example I want the red marked parts, but get the entire string:

Raw-Data:
...............</a>Signature\n </h2> \n <div class=\"codeBlock\"><pre>(ssnamex <em class=\"codeEmphasisMild\">ss [index]</em>)</pre></div> <a name=\"WS1A9193826455F5FF-1E1423D1125831BDA67-7B03\"></a><dl>\n <dt><a name=\"WS1A9193826455F5FF-7C08E89711EC57B47A8-7818\"></......................

Result:
ssnamex <em class=\"codeEmphasisMild\">ss [index]</em>)</pre></div> <a name=\"WS1A9193826455F5FF-1E1423D1125831BDA67-7B03\"></a><dl>\n.........

But for me it is OK, I can edit it with the editor.
TC 10.xx / #266191
Win 10 x64
User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Re: [WCX] RegXtract - String Extractor with RegEx

Post by *milo1012 »

2Peter
Well, it works on a standalone subject, but for a file with multiple subjects concatenated it's probably sth. different.
You could try making the quantifiers non-greedy (=lazy):

Code: Select all

<div[^>]*><pre[^>]*>\((.*?)\s<em\s[^>]*>(.*?)</em>\)</pre>
TC plugins: PCREsearch and RegXtract
User avatar
Peter
Power Member
Power Member
Posts: 2064
Joined: 2003-11-13, 13:40 UTC
Location: Schweiz

Re: [WCX] RegXtract - String Extractor with RegEx

Post by *Peter »

Thanks Milo
there is no difference to former code, but it's a great base for me to do further tests.

Peter
TC 10.xx / #266191
Win 10 x64
LeoLUG
Junior Member
Junior Member
Posts: 17
Joined: 2020-05-22, 15:24 UTC

Re: [WCX] RegXtract - String Extractor with RegEx

Post by *LeoLUG »

Thanks so much for this plugin,
As i understand it's not possible to replace an save the same document, only to get a copy of it,
The problem i have with that: When i have a lot of files in different folders, and with control +B i see them together and want to change and have them back on the same folders, how can i do that?
User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Re: [WCX] RegXtract - String Extractor with RegEx

Post by *milo1012 »

LeoLUG wrote: 2020-12-04, 03:28 UTC The problem i have with that: When i have a lot of files in different folders, and with control +B i see them together and want to change and have them back on the same folders, how can i do that?
The plug-in can create new files in the same location as the original ones, but with a new extension. Per default, the plug-in will name them <<original filename with extension>>.txt
So if you're sure if the s&r was successful, you can delete the original files and rename the newly output files with e.g. TC's MRT tool (remove the .txt extension).

So basically you need to do:
  • use ctrl+B in your target dir
  • mark all files that you want to s&r
  • now: either hold the ctrl key while clicking on the Pack files button in TC's (default) button bar, or manually change the target file mask in the Pack files dialog
  • make sure to check the "Create separate archives, one per selected file/dir" option in the Pack files dialog
  • the input box in the Pack files dialog should now look like this:

    Code: Select all

    RegXtract:*.*.RegXtract
  • now open the RegXtract config ("Configure..." button)
  • make sure that "Search and Replace" is checked
  • enter your desired RegEx and replace string
  • you may change the option in the "Outfile extension" dropdown menu, to tell the plug-in which extension it should use (.txt is default)
  • close dialog, start the pack operation
  • done, the output files should now reside in the original file's dir(s), having the same name but added extension
TC plugins: PCREsearch and RegXtract
LeoLUG
Junior Member
Junior Member
Posts: 17
Joined: 2020-05-22, 15:24 UTC

Re: [WCX] RegXtract - String Extractor with RegEx

Post by *LeoLUG »

Thanks for the so detailed post!
Post Reply