xPDFSearch 1.11 - Content plugin to search text in PDF files

Discuss and announce Total Commander plugins, addons and other useful tools here, both their usage and their development.

Moderators: sheep, Hacker, Stefan2, white

Post Reply
User avatar
Lefteous
Power Member
Power Member
Posts: 9473
Joined: 2003-02-09, 01:18 UTC
Location: Germany
Contact:

Re: xPDFSearch 1.11 - Content plugin to search text in PDF files

Post by *Lefteous »

As announced xPDFSearch is now a Github project. The idea is to improve source code management and collaboration. If you want to contribute you have to commit to your own remote feature branch and make a pull request.
https://github.com/lefteous-tc/xPDFSearch
hhk
New Member
New Member
Posts: 1
Joined: 2018-02-21, 08:22 UTC

Re: xPDFSearch 1.11 - Content plugin to search text in PDF files

Post by *hhk »

Dear Leftous,

today i installed your Plugin for a very tricky task:
there are ten-thousands of scanned PDFs, many of them contain text, some of them don´t. This depends on the various scanners they used over the years.
I have now to filter the non-text-PDFs to OCR them. Can i do this with your plugin?

ys

HHK
User avatar
Usher
Power Member
Power Member
Posts: 886
Joined: 2011-03-11, 10:11 UTC

Re: xPDFSearch 1.11 - Content plugin to search text in PDF files

Post by *Usher »

2hhk
You should test another plugin: http://totalcmd.net/plugring/pdfOCR.html
Regards from Poland
Andrzej P. Wozniak
User avatar
nsp
Power Member
Power Member
Posts: 1300
Joined: 2005-12-04, 08:39 UTC
Location: Lyon (FRANCE)
Contact:

Re: xPDFSearch 1.11 - Content plugin to search text in PDF files

Post by *nsp »

hhk wrote:
2020-01-21, 16:50 UTC
Dear Leftous,

today i installed your Plugin for a very tricky task:
there are ten-thousands of scanned PDFs, many of them contain text, some of them don´t. This depends on the various scanners they used over the years.
I have now to filter the non-text-PDFs to OCR them. Can i do this with your plugin?

ys

HHK
What you can do with xpdfsearch is to find the one that have almost no text (less than 10 characters in the following sample) and then have a list to send to your ocr software.
In Search box, search for pdf files and in plugin tab add

Code: Select all

xpdfsearch text !regexp  .{10,}
if you know which producer / application created the image only pdf, you can also search for it using dedicated properties . (PDF Producer / Application )

Once you get the file to process by OCR, you can feed to listbox. From listbox, you can also save the list to a dedicated folder of virtual-panel or in a file. Once done, you can process all files one by one using a button/user command that call your OCR engine or all at once using TCBL.

If your OCR process need times and/or manual validation, one by one process is the best choice for you. virtual-panel can help you to track non processed files ....

The PdfOCR wcx if fine to extract text only from image but will not help to rebuild a quality pdf with schema text indentation, format ...I personally use it to extract dedicated information from pdf which does not support cut/paste
User avatar
Usher
Power Member
Power Member
Posts: 886
Joined: 2011-03-11, 10:11 UTC

Re: xPDFSearch 1.11 - Content plugin to search text in PDF files

Post by *Usher »

nsp wrote:
2020-01-22, 07:09 UTC
The PdfOCR wcx if fine to extract text only from image but will not help to rebuild a quality pdf with schema text indentation, format
It's complete misunderstanding. I mean WDX, content plugin. Read the linked webpage, please:
pdfOCD 0.9 wdx wrote: • Purpose:
pdfOCR is wdx plugin that discovers how many pages of PDF file in current directory needs character recognition (OCR), i.e. how many pages in PDF file have no searchable text in their layout.
(...)
• Possible usage:
- discover pdf documents which need to be OCR-ed for the first time
- discover PDF documents which are password protected and consequently not available for OCR processing
- discover PDF documents that was not properly OCR processed because of low resolution or similar causes
- discover PDF documents not properly formatted.
See also the linked image: http://wincmd.ru/files/9924358/prezentacija_mala.jpg
Regards from Poland
Andrzej P. Wozniak
burstx
Junior Member
Junior Member
Posts: 2
Joined: 2020-02-06, 09:35 UTC

Re: xPDFSearch 1.11 - Content plugin to search text in PDF files

Post by *burstx »

I've just installed the latest TC 9.50 (x64) and tried to install xPDFSearch plugin downloaded from the official TC plugins page (the actual link to the plugin file).

Big issue
This is a sample PDF containing the text "PXContext", which is not found with the xPDFSearch plugin in use.

Small issue(perhaps this is a reason for the "big issue" described above)
If I open the plugin's .zip file inside the TC (i.e. in the files panel), the TC offers to install the plugin and the plugin is installed.
If I register the plugin via TC's "Configuration => Options..." menu, "Plugins=>Content Plugins (.WDX)" section, the error is shown:
Image: https://i.imgur.com/OcqUExd.png, although the plugin is claimed to be x32+x64-compatible.

Could you please check if there is a problem with the plugin or I configured/used it incorrectly?
burstx
Junior Member
Junior Member
Posts: 2
Joined: 2020-02-06, 09:35 UTC

Re: xPDFSearch 1.11 - Content plugin to search text in PDF files

Post by *burstx »

burstx wrote:
2020-02-06, 09:51 UTC
I've just installed the latest TC 9.50 (x64) and tried to install xPDFSearch plugin downloaded from the official TC plugins page (the actual link to the plugin file).

Big issue
This is a sample PDF containing the text "PXContext", which is not found with the xPDFSearch plugin in use.

Small issue(perhaps this is a reason for the "big issue" described above)
If I open the plugin's .zip file inside the TC (i.e. in the files panel), the TC offers to install the plugin and the plugin is installed.
If I register the plugin via TC's "Configuration => Options..." menu, "Plugins=>Content Plugins (.WDX)" section, the error is shown:
Image: https://i.imgur.com/OcqUExd.png, although the plugin is claimed to be x32+x64-compatible.

Could you please check if there is a problem with the plugin or I configured/used it incorrectly?
Sorry, Big issue is my fault. I didn't RTFM. But the small issue remains.
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 39935
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Re: xPDFSearch 1.11 - Content plugin to search text in PDF files

Post by *ghisler(Author) »

Which file did you pick in the Content Plugins (.WDX)" section?
Author of Total Commander
http://www.ghisler.com
buckauction
Junior Member
Junior Member
Posts: 2
Joined: 2020-04-08, 14:40 UTC

Re: xPDFSearch 1.11 - Content plugin to search text in PDF files

Post by *buckauction »

nsp wrote:
2020-01-22, 07:09 UTC
hhk wrote:
2020-01-21, 16:50 UTC
Dear Leftous,

I can find no information on how to inatall this plugin, I have unzipped it to a secondary folder still no info. there is a batch file and an exe file with no info either, Plese explain install process and how to work the program. Thanks it looks great.
What you can do with xpdfsearch is to find the one that have almost no text (less than 10 characters in the following sample) and then have a list to send to your ocr software.
In Search box, search for pdf files and in plugin tab add

Code: Select all

xpdfsearch text !regexp  .{10,}
if you know which producer / application created the image only pdf, you can also search for it using dedicated properties . (PDF Producer / Application )

Once you get the file to process by OCR, you can feed to listbox. From listbox, you can also save the list to a dedicated folder of virtual-panel or in a file. Once done, you can process all files one by one using a button/user command that call your OCR engine or all at once using TCBL.

If your OCR process need times and/or manual validation, one by one process is the best choice for you. virtual-panel can help you to track non processed files ....

The PdfOCR wcx if fine to extract text only from image but will not help to rebuild a quality pdf with schema text indentation, format ...I personally use it to extract dedicated information from pdf which does not support cut/paste
buckauction
Junior Member
Junior Member
Posts: 2
Joined: 2020-04-08, 14:40 UTC

Re: xPDFSearch 1.11 - Content plugin to search text in PDF files

Post by *buckauction »

What also is "WDX"?
User avatar
petermad
Power Member
Power Member
Posts: 9810
Joined: 2003-02-05, 20:24 UTC
Location: Valsted, Denmark
Contact:

Re: xPDFSearch 1.11 - Content plugin to search text in PDF files

Post by *petermad »

buckauction wrote:
2020-04-08, 14:48 UTC
What also is "WDX"?
It is Content plugins for TC.

TC supports four types of plugins:
Packer plugins (WCX)
File System plugins (WFX)
Lister plugins (WLX)
Content plugins (WDX)

Help wrote: Configuration - Plugins

Change settings for all supported plugin types.

Download new plugins from ghisler.com
Connects to the page where you can download plugins which were tested by us.

Packer plugins Allows you to configure packer plugins. Usage: Files - Pack.

File system plugins Allows you to configure file system plugins. They allow to access file systems or similar devices or systems, e.g. a PocketPC, a Linux partition, or a remote server. File system plugins are used via the Network Neighborhood.

Lister plugins Allows you to configure Lister plugins. Usage: F3 on a supported file.

Content plugins Allows you to configure content plugins. Usage: Show - custom columns, multi-rename tool, search function.

FS-Plugins Allows you the installation of file system plugins. You can find them on www.ghisler.com in the addons section.
License #524 (1994)
Danish Total Commander Translator
TC 9.51 32+64bit on Win XP 32bit & Win 7, 8.1 & 10 (2004) 64bit, 'Everything' 1.4.1.988 (x64)
TC 3.10b8 on Android 6.0
Get:
Extended TC Menus | TC Languagebar | TC Dark Help | PHSM-Calendar
amalia
New Member
New Member
Posts: 1
Joined: 2020-05-01, 16:58 UTC

Re: xPDFSearch 1.11 - Content plugin to search text in PDF files

Post by *amalia »

I can not use xPDFSearch for finding greek words within pdf documents. Is there a solution?
Post Reply