[WDX] PCREsearch

Discuss and announce Total Commander plugins, addons and other useful tools here, both their usage and their development.

Moderators: sheep, Hacker, Stefan2, white

User avatar
milo1012
Power Member
Power Member
Posts: 1110
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 » 2016-04-29, 12:22 UTC

2nsp
Like Skif_off said, the goal here is to sort files "on-the-fly" in a custom column view, i.e. without actually renaming them in MRT.
That's what regexp_wdx does, as far as I understand.
It actually might come in handy in certain situations.


2Skif_off
The filter will be optional of course (and you need to download it yourself, like in uLister, due to license restrictions), you can still use xdoc2txt as an alternative filter.
The filter is compatible at least down to Windows 2000 (32-bit) and uses the very same same runtime libs since at least 2010 (when I first heard about them). I don't see this to change in future versions, so there's no need to worry about compatibility issues for now (and also there's no big need for constant updates, as new office file formats are rare nowadays; my four year old DLL download still works for my set of office files).
TC plugins: PCREsearch and RegXtract

Skif_off
Member
Member
Posts: 118
Joined: 2013-09-30, 13:13 UTC

Post by *Skif_off » 2016-05-10, 22:57 UTC

2milo1012
I ran into some problems when using xdoc2txt without installing the Visual C++ 2008 Redistributable (xdoc2txt is not working). It was found that the problem in Microsoft.VC90.CRT.manifest and I replaced it with

Code: Select all

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!-- Copyright (c) Microsoft Corporation.  All rights reserved. -->
<assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0">
    <noInheritable/>
    <assemblyIdentity
        type="win32"
        name="Microsoft.VC90.CRT"
        version="9.0.21022.8"
        processorArchitecture="x86"
        publicKeyToken="1fc8b3b9a1e18e3b"
    />
    <file name="msvcr90.dll" /> <file name="msvcp90.dll" />
</assembly>
xdoc2txt >= 2.12 requires Visual C++ 2010 Redistributable, and without installing it needs three files:
msvcp100.dll
msvcr100.dll
Microsoft.VC90.CRT.manifest O_o

User avatar
milo1012
Power Member
Power Member
Posts: 1110
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 » 2016-05-11, 00:07 UTC

Skif_off wrote:xdoc2txt >= 2.12 requires Visual C++ 2010 Redistributable, and without installing it needs three files:
msvcp100.dll
msvcr100.dll
Microsoft.VC90.CRT.manifest O_o
Yes, I already saw that when I released PCREsearch 2.1, that's why I kept xdoc2txt 2.11 back then.
The cause of this is that the author of xdoc2txt switched to Visual Studio 2010 with version 2.12, but forgot to update the embedded manifest for the 2010 runtime DLLs.
So you end up with the xdoc2txt.exe actually needing the 2010 runtime DLLs, while the manifest says it still needs the 2008 runtime DLLs.
And so of course it won't work portable that way, as Windows doesn't find the fitting runtime DLLs, unless you install the 2010 DLL retail package.

I will try to figure out how you can patch the manifest to use the 2010 DLLs portable, but to be honest: there is no real reason to update xdoc2txt from version 2.11, as there were only a few minor changes.

(
see changelog in translation, e.g.
https://translate.google.com/translate?sl=ja&js=y&prev=_t&hl=en&ie=UTF-8&u=http%3A%2F%2Febstudio.info%2Fhome%2Fxdoc2txt.html&edit-text=
)




Update:

Turns out that it's actually quite easy.
Just download the Visual C++ 2010 Redistributable Package (x86)
https://www.microsoft.com/en-us/download/details.aspx?id=5555

extract
msvcp100.dll
msvcr100.dll

from it and copy it to the same dir as xdoc2txt.
Now use some hex editor and modify the manifest embedded in xdoc2txt. You can find it at the very end.
Overwrite

Code: Select all

 <dependency>
            <dependentAssembly>
               <assemblyIdentity type="win32" name="Microsoft.VC90.CRT" version="9.0.21022.8" processorArchitecture="x86" publicKeyToken="1fc8b3b9a1e18e3b">
               </assemblyIdentity>
            </dependentAssembly>
         </dependency>
with spaces (hex 0x20).
Just make sure not to add or remove bytes from the exe file (the file size must stay the same)
Xoc2txt 2.12 and newer should now work portable, starting from Windows XP.
TC plugins: PCREsearch and RegXtract

Skif_off
Member
Member
Posts: 118
Joined: 2013-09-30, 13:13 UTC

Post by *Skif_off » 2016-05-12, 22:56 UTC

milo1012 wrote:there is no real reason to update xdoc2txt from version 2.11, as there were only a few minor changes.
And what about this
2.14 Fixed an issue where the abnormal termination in part of the PDF
?
Author even fixed the old 1.xx. (I have not seen such a mistake and I have not updated, but I rarely used it.)

User avatar
milo1012
Power Member
Power Member
Posts: 1110
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 » 2016-05-13, 01:48 UTC

2Skif_off
Of course you should update every now and then, no doubt about that.
All I wanted to say is that you don't need to install every new version immediately, especially when you need to patch new versions (like in this case to be portable), plus we don't know the exact details of changelogs.
I had my share of experience with programs that were rock stable, but became flawed in newer versions, despite saying that things were fixed.
And that wrong embedded manifest in xdoc2txt shows exactly that: it's a new error source, despite fixing other things.

Concerning that fix you quoted:
I just did a quick comparison with xdoc2txt 2.16 VS 2.11, and didn't experience any difference, neither in output nor in stability.
I used a set of ~200 PDF files, which consists of all sorts of types (E-Books, technical documents, PDFs with input forms, with multimedia elements, security restrictions, etc...).
Stability means: a few PDFs in my collection crash xdoc2txt for unknown reason, but they still crash in the newest 2.16, so the fix didn't affect the collection at all.

Anyway, the author should fix that manifest, otherwise I won't support the newer versions for the plug-in.
TC plugins: PCREsearch and RegXtract

User avatar
milo1012
Power Member
Power Member
Posts: 1110
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 » 2016-07-04, 03:29 UTC

New Version 2.5!
  • new major feature: optional Oracle Outside In Technology Content Access filters
    • works for nearly all file formats for: Word Processors, Spreadsheets, Presentation programs, XML based data, Database files; plus will search some embedded files
    • you may now choose between xdoc2txt and the OiT filters as a text filter for specific file extensions
    • when installed and working, will provide an additional powerful Unicode capable fulltext search for TC 9.0 and above (on top of the text filter capability for the normal plug-in operation)
    • option to exclude certain file formats for the fulltext search and to filter unknown files for text
    • path for the filter DLL files is freely configurable and has a separate configuration for the 32-bit and x64 filters
    • the filters need to be downloaded separately from the Oracle site (you need to register, which is free, though you might find a way to prevent it - hint: b**me*ot)
    • needs an additional runtime package in order to work (Visual C++ Redistributable)
    • can share nearly all files with the uLister plug-in Viewer package (when using the same versions)
  • new major feature: search in the filename only
    • you can use the same type of fields as for content search: boolean, count, string assembly, average length
    • may be useful to quickly preview purified filenames in TC's custom columns, or checking for names containing specific characters, e.g. to check for non-ASCII filenames and similar
    • will always use the name including the extension
    • can be used to to check for otherwise identical files (same size and possibly date) differ in their filenames in case but still being treated and seen as identical by TC and Windows
  • compare files in TC's 'Synchronize dirs' can now work with the OiT filter (still doesn't work with xdoc2txt), to compare e.g. the content of two office file versions with a custom RegEx
  • when comparing two files with the same encoding case in-sensitive in TC's 'Synchronize dirs, the results are now allowed to differ in length, to take Unicode normalization into account
  • fixed a possible out-of-bounds memory access when comparing files in TC's 'Synchronize dirs' case sensitive
  • a few code optimizations
  • supdated to pcre 8.39
Check the first post for the new file.
TC plugins: PCREsearch and RegXtract

User avatar
Horst.Epp
Power Member
Power Member
Posts: 3479
Joined: 2003-02-06, 17:36 UTC
Location: Germany

Post by *Horst.Epp » 2016-07-04, 14:24 UTC

From your readme:
"In TC 9.0 and above you will now have an additional field Oracle Outside In fulltext search in the search dialog (Alt+F7)"
I don't see such a field in the plugins search dialog for pcresearch.

I have installed the content filters in the same dir as the ULister plugin files
and the path to it added in the ini file.
The required Run-time libs are installed because the Ulister plugin uses them without any problems.

Environment is TC 9.0b3 x64 under Windows 10
Windows 10 Home x64 November 2019 Update, Version 1909 (OS Build 18363.535)
Intel(R) Core(TM) i7-4770 CPU @ 3.40GH, 16GB RAM
TC 9.50ß9 x64 / x86, Everything 1.4.1.960 (x64)

User avatar
Ovg
Power Member
Power Member
Posts: 614
Joined: 2014-01-06, 16:26 UTC
Location: MOW

Post by *Ovg » 2016-07-04, 15:20 UTC

2Horst.Epp
May be pcresearch.Oracle Outside In fulltext search?
It's impossible to lead us astray for we don't care even to choose the way.
#259941, TC 9.5 β8 x64, Windows 7 SP1 x64

User avatar
Horst.Epp
Power Member
Power Member
Posts: 3479
Joined: 2003-02-06, 17:36 UTC
Location: Germany

Post by *Horst.Epp » 2016-07-04, 15:57 UTC

Ovg wrote:2Horst.Epp
May be pcresearch.Oracle Outside In fulltext search?
After a TC restart the property is now available and it works :D
Windows 10 Home x64 November 2019 Update, Version 1909 (OS Build 18363.535)
Intel(R) Core(TM) i7-4770 CPU @ 3.40GH, 16GB RAM
TC 9.50ß9 x64 / x86, Everything 1.4.1.960 (x64)

User avatar
dindog
Senior Member
Senior Member
Posts: 246
Joined: 2010-10-18, 07:41 UTC

Re: [WDX] PCREsearch

Post by *dindog » 2019-10-19, 04:32 UTC

this is very powerful plugin especially with.the new OiT content access fulltext search, but the UI and the name give user wrong impression that this is a geek search plugin which probably useless or too hard for them, feel sorry about that.

User avatar
milo1012
Power Member
Power Member
Posts: 1110
Joined: 2012-02-02, 19:23 UTC

Re: [WDX] PCREsearch

Post by *milo1012 » 2019-10-20, 11:33 UTC

2dindog
I agree that the UI (the config tool) is not exactly optimal, but that's due to the contant feature adding in the past and the plugin was originally planned w/o any UI whatsoever. I'm planning to make it more concise/accessible for a long time now, but as with my other plugins: I can't say when I'll have the time for it.
TC plugins: PCREsearch and RegXtract

marceepoonu
Junior Member
Junior Member
Posts: 5
Joined: 2009-08-19, 17:34 UTC
Location: Los Angeles, California
Contact:

Re: [WDX] PCREsearch

Post by *marceepoonu » 2019-11-12, 22:23 UTC

After I installed [WDX] PCREsearch, I opened the Find Files window in TC.
Then I opened the Plugins Tab, and (as this image Image: https://tinyurl.com/yzc86tko shows), in the Plugins tab I went ahead and:
1. Clicked the "Search in plugins" box;
2. Selected the "AND" radio button which I do not understand; and
3. Selected "pcresgareh" in the Plugin column.

Next, I formulated this regular expression search in RegexBuddy, using the PCRE 8.39 UTF-32 flavor regular expressions.
This image Image: https://tinyurl.com/yzc86tko shows that the regular expression matches the highlighted files located in the directory:
E:\Apps\UtilitiesByMarc

Lastly, I selected the General tab in TC, and then I clicked the "Start search" button
But as this image Image: https://tinyurl.com/yzc86tko reflects, TC's "Search results" window shows "[No files found]"

So, my question is what do I need to do differently, so that I can search for files in that directory (E:\Apps\UtilitiesByMarc) using the regular expression below?
((.+?)?(?=\.vbs) ) ( .vbs)$
Marc Hankin

User avatar
milo1012
Power Member
Power Member
Posts: 1110
Joined: 2012-02-02, 19:23 UTC

Re: [WDX] PCREsearch

Post by *milo1012 » 2019-11-13, 18:25 UTC

marceepoonu wrote:
2019-11-12, 22:23 UTC
((.+?)?(?=\.vbs) ) ( .vbs)$
I think you have a few extra spaces in the expression.
Anyway, you're probably referring to this post:
viewtopic.php?t=54699

While PCREsearch may be used for searching in filenames only (it is primarily intended for file content), it can be quite clumsy, as for every configuration change you have to restart TC or use the TC command
cm_UnloadPlugins

In any case, for PCREsearch to work, do the following:
In the location where the plug-in is installed, there is a tool PCREsearchConfig.exe. Start it.
  • Now from the very left field list choose a field which you want to override, or increase the counter on the top left for adding additional field(s) (fields available to TC are identified by the prefixed "-->")
  • Mark your choosen field in the left list
  • Enter your RegEx in the "Regular Expression" box
  • In the "Field type" area, choose "Boolean"
  • In the "Field flags / options" area check "Search in filename only"
  • Finally name your field, so that you can later identify it in TC
  • Hit "Apply" and close the config tool
  • Now restart TC (or type cm_UnloadPlugins in TC's command line)
Using the field:
In TC's search dialog go to "Plugins" and choose:
pcresearch -> <your field name> -> = -> Yes
Start the search. TC should now find all files that match your expression.
TC plugins: PCREsearch and RegXtract

User avatar
dindog
Senior Member
Senior Member
Posts: 246
Joined: 2010-10-18, 07:41 UTC

Re: [WDX] PCREsearch

Post by *dindog » 2019-11-22, 06:43 UTC

2milo1012
I've found that PCREsearch don't read locked file... it's basically read buffer and and do string search, no danger in opening those files in read-only mode... or at least gives user an option to read the opened file

User avatar
milo1012
Power Member
Power Member
Posts: 1110
Joined: 2012-02-02, 19:23 UTC

Re: [WDX] PCREsearch

Post by *milo1012 » 2019-11-25, 15:33 UTC

2dindog
I assume you mean that these "locked" files are opened with FILE_SHARE_READ|FILE_SHARE_WRITE|FILE_SHARE_DELETE and similar combinations
In PCREsearch I open files with

Code: Select all

GENERIC_READ, FILE_SHARE_READ
This means that it won't open a file already opened by another process which requested file access with FILE_SHARE_WRITE or more flags. The reason is simple: such process having a file open with write access might manipulate (write) a file while I read it in the plug-in part by part, which might give an inconsistent result of whatever you're trying do with the plug-in, e.g. the wrong character count, wrong encoding check and so on. Using FILE_SHARE_READ exclusively prevents this. And of course, other "locked" files are opened w/o any share flags. For such files there's nothing I can do to open them (except using low-level API functions, but that's not an option).

But sure, I could add another field flag/option to open such files anyway (with the risk of inconsistent results) to the next plug-in version, maybe depending on if the file fits completely into the first read buffer.
TC plugins: PCREsearch and RegXtract

Post Reply