xPDFSearch 1.11 - Content plugin to search text in PDF files

Discuss and announce Total Commander plugins, addons and other useful tools here, both their usage and their development.

Moderators: white, Hacker, petermad, Stefan2

Post Reply
User avatar
Lefteous
Power Member
Power Member
Posts: 9535
Joined: 2003-02-09, 01:18 UTC
Location: Germany
Contact:

Re: xPDFSearch 1.11 - Content plugin to search text in PDF files

Post by *Lefteous »

As announced xPDFSearch is now a Github project. The idea is to improve source code management and collaboration. If you want to contribute you have to commit to your own remote feature branch and make a pull request.
https://github.com/lefteous-tc/xPDFSearch
hhk
New Member
New Member
Posts: 1
Joined: 2018-02-21, 08:22 UTC

Re: xPDFSearch 1.11 - Content plugin to search text in PDF files

Post by *hhk »

Dear Leftous,

today i installed your Plugin for a very tricky task:
there are ten-thousands of scanned PDFs, many of them contain text, some of them don´t. This depends on the various scanners they used over the years.
I have now to filter the non-text-PDFs to OCR them. Can i do this with your plugin?

ys

HHK
User avatar
Usher
Power Member
Power Member
Posts: 1675
Joined: 2011-03-11, 10:11 UTC

Re: xPDFSearch 1.11 - Content plugin to search text in PDF files

Post by *Usher »

2hhk
You should test another plugin: http://totalcmd.net/plugring/pdfOCR.html
Andrzej P. Wozniak
Polish subforum moderator
User avatar
nsp
Power Member
Power Member
Posts: 1803
Joined: 2005-12-04, 08:39 UTC
Location: Lyon (FRANCE)
Contact:

Re: xPDFSearch 1.11 - Content plugin to search text in PDF files

Post by *nsp »

hhk wrote: 2020-01-21, 16:50 UTC Dear Leftous,

today i installed your Plugin for a very tricky task:
there are ten-thousands of scanned PDFs, many of them contain text, some of them don´t. This depends on the various scanners they used over the years.
I have now to filter the non-text-PDFs to OCR them. Can i do this with your plugin?

ys

HHK
What you can do with xpdfsearch is to find the one that have almost no text (less than 10 characters in the following sample) and then have a list to send to your ocr software.
In Search box, search for pdf files and in plugin tab add

Code: Select all

xpdfsearch text !regexp  .{10,}
if you know which producer / application created the image only pdf, you can also search for it using dedicated properties . (PDF Producer / Application )

Once you get the file to process by OCR, you can feed to listbox. From listbox, you can also save the list to a dedicated folder of virtual-panel or in a file. Once done, you can process all files one by one using a button/user command that call your OCR engine or all at once using TCBL.

If your OCR process need times and/or manual validation, one by one process is the best choice for you. virtual-panel can help you to track non processed files ....

The PdfOCR wcx if fine to extract text only from image but will not help to rebuild a quality pdf with schema text indentation, format ...I personally use it to extract dedicated information from pdf which does not support cut/paste
User avatar
Usher
Power Member
Power Member
Posts: 1675
Joined: 2011-03-11, 10:11 UTC

Re: xPDFSearch 1.11 - Content plugin to search text in PDF files

Post by *Usher »

nsp wrote: 2020-01-22, 07:09 UTC The PdfOCR wcx if fine to extract text only from image but will not help to rebuild a quality pdf with schema text indentation, format
It's complete misunderstanding. I mean WDX, content plugin. Read the linked webpage, please:
pdfOCD 0.9 wdx wrote: • Purpose:
pdfOCR is wdx plugin that discovers how many pages of PDF file in current directory needs character recognition (OCR), i.e. how many pages in PDF file have no searchable text in their layout.
(...)
• Possible usage:
- discover pdf documents which need to be OCR-ed for the first time
- discover PDF documents which are password protected and consequently not available for OCR processing
- discover PDF documents that was not properly OCR processed because of low resolution or similar causes
- discover PDF documents not properly formatted.
See also the linked image: http://wincmd.ru/files/9924358/prezentacija_mala.jpg
Andrzej P. Wozniak
Polish subforum moderator
burstx
Junior Member
Junior Member
Posts: 2
Joined: 2020-02-06, 09:35 UTC

Re: xPDFSearch 1.11 - Content plugin to search text in PDF files

Post by *burstx »

I've just installed the latest TC 9.50 (x64) and tried to install xPDFSearch plugin downloaded from the official TC plugins page (the actual link to the plugin file).

Big issue
This is a sample PDF containing the text "PXContext", which is not found with the xPDFSearch plugin in use.

Small issue(perhaps this is a reason for the "big issue" described above)
If I open the plugin's .zip file inside the TC (i.e. in the files panel), the TC offers to install the plugin and the plugin is installed.
If I register the plugin via TC's "Configuration => Options..." menu, "Plugins=>Content Plugins (.WDX)" section, the error is shown:
Image: https://i.imgur.com/OcqUExd.png, although the plugin is claimed to be x32+x64-compatible.

Could you please check if there is a problem with the plugin or I configured/used it incorrectly?
burstx
Junior Member
Junior Member
Posts: 2
Joined: 2020-02-06, 09:35 UTC

Re: xPDFSearch 1.11 - Content plugin to search text in PDF files

Post by *burstx »

burstx wrote: 2020-02-06, 09:51 UTC I've just installed the latest TC 9.50 (x64) and tried to install xPDFSearch plugin downloaded from the official TC plugins page (the actual link to the plugin file).

Big issue
This is a sample PDF containing the text "PXContext", which is not found with the xPDFSearch plugin in use.

Small issue(perhaps this is a reason for the "big issue" described above)
If I open the plugin's .zip file inside the TC (i.e. in the files panel), the TC offers to install the plugin and the plugin is installed.
If I register the plugin via TC's "Configuration => Options..." menu, "Plugins=>Content Plugins (.WDX)" section, the error is shown:
Image: https://i.imgur.com/OcqUExd.png, although the plugin is claimed to be x32+x64-compatible.

Could you please check if there is a problem with the plugin or I configured/used it incorrectly?
Sorry, Big issue is my fault. I didn't RTFM. But the small issue remains.
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48021
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Re: xPDFSearch 1.11 - Content plugin to search text in PDF files

Post by *ghisler(Author) »

Which file did you pick in the Content Plugins (.WDX)" section?
Author of Total Commander
https://www.ghisler.com
buckauction
Junior Member
Junior Member
Posts: 2
Joined: 2020-04-08, 14:40 UTC

Re: xPDFSearch 1.11 - Content plugin to search text in PDF files

Post by *buckauction »

nsp wrote: 2020-01-22, 07:09 UTC
hhk wrote: 2020-01-21, 16:50 UTC Dear Leftous,

I can find no information on how to inatall this plugin, I have unzipped it to a secondary folder still no info. there is a batch file and an exe file with no info either, Plese explain install process and how to work the program. Thanks it looks great.
What you can do with xpdfsearch is to find the one that have almost no text (less than 10 characters in the following sample) and then have a list to send to your ocr software.
In Search box, search for pdf files and in plugin tab add

Code: Select all

xpdfsearch text !regexp  .{10,}
if you know which producer / application created the image only pdf, you can also search for it using dedicated properties . (PDF Producer / Application )

Once you get the file to process by OCR, you can feed to listbox. From listbox, you can also save the list to a dedicated folder of virtual-panel or in a file. Once done, you can process all files one by one using a button/user command that call your OCR engine or all at once using TCBL.

If your OCR process need times and/or manual validation, one by one process is the best choice for you. virtual-panel can help you to track non processed files ....

The PdfOCR wcx if fine to extract text only from image but will not help to rebuild a quality pdf with schema text indentation, format ...I personally use it to extract dedicated information from pdf which does not support cut/paste
buckauction
Junior Member
Junior Member
Posts: 2
Joined: 2020-04-08, 14:40 UTC

Re: xPDFSearch 1.11 - Content plugin to search text in PDF files

Post by *buckauction »

What also is "WDX"?
User avatar
petermad
Power Member
Power Member
Posts: 14739
Joined: 2003-02-05, 20:24 UTC
Location: Denmark
Contact:

Re: xPDFSearch 1.11 - Content plugin to search text in PDF files

Post by *petermad »

buckauction wrote: 2020-04-08, 14:48 UTC What also is "WDX"?
It is Content plugins for TC.

TC supports four types of plugins:
Packer plugins (WCX)
File System plugins (WFX)
Lister plugins (WLX)
Content plugins (WDX)

Help wrote: Configuration - Plugins

Change settings for all supported plugin types.

Download new plugins from ghisler.com
Connects to the page where you can download plugins which were tested by us.

Packer plugins Allows you to configure packer plugins. Usage: Files - Pack.

File system plugins Allows you to configure file system plugins. They allow to access file systems or similar devices or systems, e.g. a PocketPC, a Linux partition, or a remote server. File system plugins are used via the Network Neighborhood.

Lister plugins Allows you to configure Lister plugins. Usage: F3 on a supported file.

Content plugins Allows you to configure content plugins. Usage: Show - custom columns, multi-rename tool, search function.

FS-Plugins Allows you the installation of file system plugins. You can find them on www.ghisler.com in the addons section.
License #524 (1994)
Danish Total Commander Translator
TC 11.03 32+64bit on Win XP 32bit & Win 7, 8.1 & 10 (22H2) 64bit, 'Everything' 1.5.0.1371a
TC 3.50b4 on Android 6 & 13
Try: TC Extended Menus | TC Languagebar | TC Dark Help | PHSM-Calendar
amalia
New Member
New Member
Posts: 1
Joined: 2020-05-01, 16:58 UTC

Re: xPDFSearch 1.11 - Content plugin to search text in PDF files

Post by *amalia »

I can not use xPDFSearch for finding greek words within pdf documents. Is there a solution?
pmicchow
New Member
New Member
Posts: 1
Joined: 2021-03-08, 07:16 UTC

Re: xPDFSearch 1.11 - Content plugin to search text in PDF files

Post by *pmicchow »

Dear Sir/Madam

I am in the process of conducting researches from a book which I have downloaded in both word and pdf formats. My researches require me to extract the contents from the book to include 2 groups or strings of words from some relevant text.
The following are 2 examples:

Example 1
First group (string): not only
Second group (string): but also
Relevant text 1:
He is not only intelligent but also funny.
Relevant text 2:
Mr X is not only an actor but also a philanthropist.

Example 2
First group: scarcely
Second group: when
Relevant text 1:
I had scarcely walked in the door when I got an urgent call and had to run right back out again.
Relevant text 2:
Scarcely had the teacher seen the student when he started studying.

My question is, how would I be able to extract the relevant text of the desired strings of words which are normally consisted of 2 groupings as demonstrated in the above 2 examples. Preferably, I would like to receive instructions on how to do so from both a word document and a pdf document.
I would like to thank you in advance.


Regards
Preston Chow
User avatar
white
Power Member
Power Member
Posts: 4593
Joined: 2003-11-19, 08:16 UTC
Location: Netherlands

Re: xPDFSearch 1.11 - Content plugin to search text in PDF files

Post by *white »

Moderator message from: white » 2023-01-10, 23:38 UTC

Lefteous wrote: 2023-01-10, 22:45 UTC I would propose to split this thread (by moderators) at the point where you forked the plugin.
Done.

The thread about zeeko's fork is here: xPDFSearch 1.38 - Content plugin to search text in PDF files
User avatar
Lefteous
Power Member
Power Member
Posts: 9535
Joined: 2003-02-09, 01:18 UTC
Location: Germany
Contact:

Re: xPDFSearch 1.11 - Content plugin to search text in PDF files

Post by *Lefteous »

2white
Thank you!
Post Reply