xPDFSearch 1.11 - Content plugin to search text in PDF files

Discuss and announce Total Commander plugins, addons and other useful tools here, both their usage and their development.

Moderators: sheep, Hacker, Stefan2, white

Post Reply
User avatar
Lefteous
Power Member
Power Member
Posts: 9460
Joined: 2003-02-09, 01:18 UTC
Location: Germany
Contact:

Re: xPDFSearch 1.11 - Content plugin to search text in PDF files

Post by *Lefteous »

As announced xPDFSearch is now a Github project. The idea is to improve source code management and collaboration. If you want to contribute you have to commit to your own remote feature branch and make a pull request.
https://github.com/lefteous-tc/xPDFSearch

hhk
New Member
New Member
Posts: 1
Joined: 2018-02-21, 08:22 UTC

Re: xPDFSearch 1.11 - Content plugin to search text in PDF files

Post by *hhk »

Dear Leftous,

today i installed your Plugin for a very tricky task:
there are ten-thousands of scanned PDFs, many of them contain text, some of them don´t. This depends on the various scanners they used over the years.
I have now to filter the non-text-PDFs to OCR them. Can i do this with your plugin?

ys

HHK

User avatar
Usher
Power Member
Power Member
Posts: 792
Joined: 2011-03-11, 10:11 UTC

Re: xPDFSearch 1.11 - Content plugin to search text in PDF files

Post by *Usher »

2hhk
You should test another plugin: http://totalcmd.net/plugring/pdfOCR.html
Regards from Poland
Andrzej P. Wozniak

User avatar
nsp
Power Member
Power Member
Posts: 1228
Joined: 2005-12-04, 08:39 UTC
Location: Lyon (FRANCE)
Contact:

Re: xPDFSearch 1.11 - Content plugin to search text in PDF files

Post by *nsp »

hhk wrote:
2020-01-21, 16:50 UTC
Dear Leftous,

today i installed your Plugin for a very tricky task:
there are ten-thousands of scanned PDFs, many of them contain text, some of them don´t. This depends on the various scanners they used over the years.
I have now to filter the non-text-PDFs to OCR them. Can i do this with your plugin?

ys

HHK
What you can do with xpdfsearch is to find the one that have almost no text (less than 10 characters in the following sample) and then have a list to send to your ocr software.
In Search box, search for pdf files and in plugin tab add

Code: Select all

xpdfsearch text !regexp  .{10,}
if you know which producer / application created the image only pdf, you can also search for it using dedicated properties . (PDF Producer / Application )

Once you get the file to process by OCR, you can feed to listbox. From listbox, you can also save the list to a dedicated folder of virtual-panel or in a file. Once done, you can process all files one by one using a button/user command that call your OCR engine or all at once using TCBL.

If your OCR process need times and/or manual validation, one by one process is the best choice for you. virtual-panel can help you to track non processed files ....

The PdfOCR wcx if fine to extract text only from image but will not help to rebuild a quality pdf with schema text indentation, format ...I personally use it to extract dedicated information from pdf which does not support cut/paste

User avatar
Usher
Power Member
Power Member
Posts: 792
Joined: 2011-03-11, 10:11 UTC

Re: xPDFSearch 1.11 - Content plugin to search text in PDF files

Post by *Usher »

nsp wrote:
2020-01-22, 07:09 UTC
The PdfOCR wcx if fine to extract text only from image but will not help to rebuild a quality pdf with schema text indentation, format
It's complete misunderstanding. I mean WDX, content plugin. Read the linked webpage, please:
pdfOCD 0.9 wdx wrote: • Purpose:
pdfOCR is wdx plugin that discovers how many pages of PDF file in current directory needs character recognition (OCR), i.e. how many pages in PDF file have no searchable text in their layout.
(...)
• Possible usage:
- discover pdf documents which need to be OCR-ed for the first time
- discover PDF documents which are password protected and consequently not available for OCR processing
- discover PDF documents that was not properly OCR processed because of low resolution or similar causes
- discover PDF documents not properly formatted.
See also the linked image: http://wincmd.ru/files/9924358/prezentacija_mala.jpg
Regards from Poland
Andrzej P. Wozniak

burstx
New Member
New Member
Posts: 1
Joined: 2020-02-06, 09:35 UTC

Re: xPDFSearch 1.11 - Content plugin to search text in PDF files

Post by *burstx »

I've just installed the latest TC 9.50 (x64) and tried to install xPDFSearch plugin downloaded from the official TC plugins page (the actual link to the plugin file).

Big issue
This is a sample PDF containing the text "PXContext", which is not found with the xPDFSearch plugin in use.

Small issue(perhaps this is a reason for the "big issue" described above)
If I open the plugin's .zip file inside the TC (i.e. in the files panel), the TC offers to install the plugin and the plugin is installed.
If I register the plugin via TC's "Configuration => Options..." menu, "Plugins=>Content Plugins (.WDX)" section, the error is shown:
Image: https://i.imgur.com/OcqUExd.png, although the plugin is claimed to be x32+x64-compatible.

Could you please check if there is a problem with the plugin or I configured/used it incorrectly?

Post Reply