Page 1 of 1

Compare by content (.doc*, .pdf, .xls*)

Posted: 2021-06-14, 07:49 UTC
by miskox
If it is possible it would be really nice to have a support for compare by content for at least .doc*, *.xls* and .pdf. (if already implemented I can't find it).

Thanks.
Saso

Re: Compare by content (.doc*, .pdf, .xls*)

Posted: 2021-06-14, 16:20 UTC
by MVV
No, TC can't compare such files. Perhaps it would be nice to be able to compare extracted from WDX plugins text but currently there is no such feature.
Such formats are complex and may contain e.g. images and may not contain any text, and it is quite complicated task to get contents from them, also it is really a question what representation of these files TC should compare (e.g. one may want to replace images with some text while other one may want to remove images completely, and some PDFs may be w/o text layer at all).

Re: Compare by content (.doc*, .pdf, .xls*)

Posted: 2021-06-14, 16:37 UTC
by Dalai
How do you compare files contain the same text but different formatting? What about hidden columns in spreadsheets? What about colors? And what about PDFs that don't contain any text at all but images instead? And I haven't even started about embedded scripts and macros yet.

In short: What you're asking is impossible in my opinion because of the complexity of these document types. At the very least it's a task for software which specializes in comparing such document types.

Regards
Dalai

Re: Compare by content (.doc*, .pdf, .xls*)

Posted: 2021-06-14, 20:30 UTC
by hi5
Winmerge (and others) can compare such files using plugins[1], you can install Winmerge as "portable" application if that would be a requirement, so it is only a matter of preparing a button, user command or start menu entry to be able to compare such documents, works pretty good - for text obviously (plugins extract the text from the files for comparison)

[1] https://manual.winmerge.org/en/Plugins.html

Re: Compare by content (.doc*, .pdf, .xls*)

Posted: 2021-06-18, 09:44 UTC
by miskox
Yes, HI5. This is what I do now. I CTRL+A, CTRL+C and CTRL-V into two separate .txt files and do a compare by content.

I will take a look.

Anyway, suggestion is still valid.

Thanks.
Saso

Re: Compare by content (.doc*, .pdf, .xls*)

Posted: 2021-06-18, 15:43 UTC
by hi5
No need for ctrlc/v etc as you can select the files and pass them on directly, edit a button, press help and look for the parameters section
Parameters:
%P causes the source path to be inserted into the command line, including a backslash (\) at the end.
%N places the filename under the cursor into the command line.
%T inserts the current target path. Especially useful for packers.
%M places the current filename in the target directory into the command line.
etc

Re: Compare by content (.doc*, .pdf, .xls*)

Posted: 2021-06-18, 16:06 UTC
by Horst.Epp
hi5 wrote: 2021-06-14, 20:30 UTC Winmerge (and others) can compare such files using plugins[1], you can install Winmerge as "portable" application if that would be a requirement, so it is only a matter of preparing a button, user command or start menu entry to be able to compare such documents, works pretty good - for text obviously (plugins extract the text from the files for comparison)

[1] https://manual.winmerge.org/en/Plugins.html
I tested the actual Winmerge version and its default plugins with some docx files.
That fails completely and outputs garbige as diffs.
Completly useless, tested with several .doc and .docx files.

Re: Compare by content (.doc*, .pdf, .xls*)

Posted: 2021-06-18, 16:20 UTC
by hi5
Of course it will depend on the files, but for my purposes it does work.
(edit: just for reference I'm using the xdocdiffPlugin - http://freemind.s57.xrea.com/xdocdiffPlugin/en/index.html)

Re: Compare by content (.doc*, .pdf, .xls*)

Posted: 2021-06-18, 17:25 UTC
by Horst.Epp
hi5 wrote: 2021-06-18, 16:20 UTC Of course it will depend on the files, but for my purposes it does work.
(edit: just for reference I'm using the xdocdiffPlugin - http://freemind.s57.xrea.com/xdocdiffPlugin/en/index.html)
Thats unfortunately a 32bit DLL
which doesn't work with x64 Winmerge.
I found an x64 version, will try it.
{Edit]
Tests with some docx files are working :D
Thanks for the tip.

Re: Compare by content (.doc*, .pdf, .xls*)

Posted: 2021-06-19, 11:27 UTC
by Horst.Epp
Unfortunately the x64 version of the Winmerge plugin crashes WinMerge.
The funny thing is that it works but in the background a WER (Windows error log) is created