Suggestion: Compare sets of dirs, highlight different sizes
Moderators: Hacker, petermad, Stefan2, white
Suggestion: Compare sets of dirs, highlight different sizes
Grounds:
I have two huge sets of data on different drives: hundreds of directories containing both files and subdirectories with other files.
Problem:
I would like to know whether the contents count of EACH dir in one set differs from the count of the directory of the same name in the other set, or not.
In other words, I want to know if the content count of dirs "X, Y, Z..." in set A differs from the content count of the corresponding dirs "X, Y, Z..." in set B, indicating in this case discrepancies in the two sets, pending an analysis of their causes.
The command: "cm_CompareDirsWithSubdirs" only highlights missing dirs or dirs with different names in each of the two sets, irrespective of the amount of bytes they contain, giving no indication of incongruity in content. Two equal sets of corresponding dirs are therefore seen as identical even if one or more of their dirs contains a different number of files or different versions of them respect to the other set.
The sad story:
There might be a way of doing this in an easy way, but I've been too dumb to find it. Instead, what I have been doing since version 3 of TC is to run "cm_CountDirContent" on both sets and VISUALLY compare the conformity between the counts of each pair, scrolling down the lists. A very long and tedious task that gets harder and harder with increasing age and fading eyesight.
A poor alternative would be to run "cm_FileSync" without performing a synchronization (as this is not desirable in my case), but only taking note of the "offending" dirs' names. But this is also tedious. The sets are HUGE and there's no function for exporting the "compare" analysis to a text file. Which would involve a lot of extra retracing work anyway.
Wishful thinking:
Would it be at all possible to have a function that runs "cm_CountDirContent" on the two panels/sets and then makes an "intelligent" comparison, in much the same way as the command "cm_CompareDirs" does with files of the same name but of different size, by highlighting? I am only interested in sizes, not dates or other attributes.
Thanks a lot for bearing with me all the way through this long post.
I have two huge sets of data on different drives: hundreds of directories containing both files and subdirectories with other files.
Problem:
I would like to know whether the contents count of EACH dir in one set differs from the count of the directory of the same name in the other set, or not.
In other words, I want to know if the content count of dirs "X, Y, Z..." in set A differs from the content count of the corresponding dirs "X, Y, Z..." in set B, indicating in this case discrepancies in the two sets, pending an analysis of their causes.
The command: "cm_CompareDirsWithSubdirs" only highlights missing dirs or dirs with different names in each of the two sets, irrespective of the amount of bytes they contain, giving no indication of incongruity in content. Two equal sets of corresponding dirs are therefore seen as identical even if one or more of their dirs contains a different number of files or different versions of them respect to the other set.
The sad story:
There might be a way of doing this in an easy way, but I've been too dumb to find it. Instead, what I have been doing since version 3 of TC is to run "cm_CountDirContent" on both sets and VISUALLY compare the conformity between the counts of each pair, scrolling down the lists. A very long and tedious task that gets harder and harder with increasing age and fading eyesight.
A poor alternative would be to run "cm_FileSync" without performing a synchronization (as this is not desirable in my case), but only taking note of the "offending" dirs' names. But this is also tedious. The sets are HUGE and there's no function for exporting the "compare" analysis to a text file. Which would involve a lot of extra retracing work anyway.
Wishful thinking:
Would it be at all possible to have a function that runs "cm_CountDirContent" on the two panels/sets and then makes an "intelligent" comparison, in much the same way as the command "cm_CompareDirs" does with files of the same name but of different size, by highlighting? I am only interested in sizes, not dates or other attributes.
Thanks a lot for bearing with me all the way through this long post.
Last edited by Artist on 2014-01-21, 08:42 UTC, edited 3 times in total.
est modus in rebus
(License #: 3886)
(License #: 3886)
Suggestion: Compare sets of dirs, highlight different sizes
Thanks for your answer meisl. Unfortunately I don't understand any of it. I'm simply not "inside" the tech part of TC. Just a long, long time user who likes buttons with commands that do something. From your words though, it seems you suggest stating the number of dirs/files contained in the two sets as a sort of parameter. Can't do. The data sets vary in that regard with the passing of time, and the number of dirs/files in them growing while still mantaining an equal structure.meisl wrote:Would having a (WDX, content) plugin field* stating the nr of contained files and/or dirs (both, direct children only and/or all, recursively) for each dir be helpful?
---
* such a field can be used as a custom column or as criterion in search or in compare/sync dirs
So the issue still seems to be compairing dirs sizes as "cm_CompareDirs" does with file sizes. Could it be done with a new command?
I guess only Christian can answer that. But maybe I'm wrong.
est modus in rebus
(License #: 3886)
(License #: 3886)
Maybe you'll find a fitting WDX here, but one of the few programs I'll blatantly advertise for comparing anything is Beyond Compare.
TC is a Jack of all trades but there are still some excellent specialists out there.
*Edit: Link fixed
TC is a Jack of all trades but there are still some excellent specialists out there.
*Edit: Link fixed
Last edited by ZoSTeR on 2014-01-21, 21:13 UTC, edited 3 times in total.
OoopsArtist wrote:Unfortunately I don't understand any of it

Plugins extend TC's functionality, similar to what you can do with a button that invokes a custom command - but there's a difference.
Plugins can pass back information to TC s.t. it can take actions based on that information.
For example, TC can use it to perform: sorting, finding (and based on that, marking), and - comparison (!).
This is opposed to passing a parameter to a tool/command; it's the opposite direction, you know.
A plugin would calculate - right now and then - the actual nr of files contained and/or the ttl space occupied, or whatever.
And then it would hand these values over to TC for further processing (see above).
Now, before we proceed, could you plz specify what you mean exactly by "content count" or "dir size"?
- - Is it the number of files/folders contained (direct children only or rather all of them, ie in sub-sub-...-folders, too?)?
- Or is it the space occupied by them?
- If the previous got a "yes": since it's two drives - NOT considering cluster size I guess? (a "net" value rather than a "gross")
- Note that even if all points above would indicate "identical", this does NOT mean that the contents are really the same (!). You said you were not interested in dates or other attributes - but even then: there could be different sets of files which just happen to be same in number and ttl size. So...?
- any combination of the above?
There is: mark files (using CTRL-A for all or SHIFT-click and/or CTRL-click for individual selection) and press CTRL-C, then paste into your favourite editor.Artist wrote:The sets are HUGE and there's no function for exporting the "compare" analysis to a text file. Which would involve a lot of extra retracing work anyway.
EDIT: @ZoSTeR: link Beyond Compare seems broken.
Last edited by meisl on 2014-01-21, 21:58 UTC, edited 6 times in total.
Just noticed:
This misspelling kind of nicely summarizes the task:Artist wrote:So the issue still seems to be compairing dirs sizes
- - build a list of pairs, ie associate members of one set with some (or, possibly, [n]one) of another (here: same name, from left and right panel)
- then, for each pair from that list, perform a certain comparison (here: remaining to be specified exactly)
Thanks meisl and ZoSTeR!
Yes, I'm interested only in the space occupied by the directories, the size in bytes. And yes, I'm aware of the possibilty that there could be different sets of files which just happen to be same in number and size. But in my case the risk is really minimal and I'm willing to run it.
As I wrote previously, the whole issue for me boils down to simply comparing dirs' sizes (cluster size not relevant in my context) in the same way as "cm_CompareDirs" does with file sizes, highlighting in the two sets/panels those that don't match in size. This in order to get a quick overview of upcoming discrepancies.
Such function in teory (at least to a code-ignorant 'happy go lucky' guy like me
) doesn't seem to be such a great deal, when the dirs' sizes have been computed by TC.
It seems I'm wrong though, and this thing appears to be more complicated then I thought, involving plug-in architectures and data feedback. I'm afraid this will be out of my reach, unless I'm fed both with code and detailed instructions, which in my view is to much to ask. Thats why I hoped that Christian would come in, clap on my back and say: "Sure thing! We just forgot to implement this trifling bit of code, but it will be included in the next update! Anything for one of the first 5.000 registered users!
ZoSTeR, I'm looking at Beyond Compare, thanks. The first impression though, is like getting heavy artillery to shoot mosquitos. But I'll check it carefully. And I agree with you that "TC is a Jack of all trades". But it seems to me that in lacking this in my view "basic" compare function, this otherwise marvellously efficient "swiss army knife" is short of a very important tool.
Yes, I'm interested only in the space occupied by the directories, the size in bytes. And yes, I'm aware of the possibilty that there could be different sets of files which just happen to be same in number and size. But in my case the risk is really minimal and I'm willing to run it.
As I wrote previously, the whole issue for me boils down to simply comparing dirs' sizes (cluster size not relevant in my context) in the same way as "cm_CompareDirs" does with file sizes, highlighting in the two sets/panels those that don't match in size. This in order to get a quick overview of upcoming discrepancies.
Such function in teory (at least to a code-ignorant 'happy go lucky' guy like me

It seems I'm wrong though, and this thing appears to be more complicated then I thought, involving plug-in architectures and data feedback. I'm afraid this will be out of my reach, unless I'm fed both with code and detailed instructions, which in my view is to much to ask. Thats why I hoped that Christian would come in, clap on my back and say: "Sure thing! We just forgot to implement this trifling bit of code, but it will be included in the next update! Anything for one of the first 5.000 registered users!

ZoSTeR, I'm looking at Beyond Compare, thanks. The first impression though, is like getting heavy artillery to shoot mosquitos. But I'll check it carefully. And I agree with you that "TC is a Jack of all trades". But it seems to me that in lacking this in my view "basic" compare function, this otherwise marvellously efficient "swiss army knife" is short of a very important tool.
est modus in rebus
(License #: 3886)
(License #: 3886)
Ok thanks, that pretty much answers all the questions I posed.
Now, the *big* problem seems to me this:
TC does not treat folders as things to be compared as such in sync-dirs, and neither in compare-dirs. It's after files only there, and only for them it possibly invokes plugin functions for "comparing by content" (which is where the plugin could provide the info determining your criterion).
The problem is about integration. That is, how this information can be used in further processing.
But don't be afraid, you do NOT need to read other ppl's code or understand the plugin API, neither need you be able to write a plugin yourself.
You need, however, be able to describe your demands precisely - which you fairly did already
.
My ramblings about which information goes where, and when, were meant rather as to focus the discussion, and give you hints on which direction might be a dead-end (=buttons).
Another thing you need to be prepared for - if you want your problem solved - is to try out things and report.
Here are are two suggestions from my side:

But well, you never know...
---
Will give "Beyond Compare" a closer look (haven't yet so far).
Now, the *big* problem seems to me this:
TC does not treat folders as things to be compared as such in sync-dirs, and neither in compare-dirs. It's after files only there, and only for them it possibly invokes plugin functions for "comparing by content" (which is where the plugin could provide the info determining your criterion).
Sure it ain't too difficult to compute your criterion. It doesn't even matter much if TC does it or a plugin.Artist wrote:Such function in teory (at least to a code-ignorant 'happy go lucky' guy like me Very Happy) doesn't seem to be such a great deal, when the dirs' sizes have been computed by TC.
It seems I'm wrong though, and this thing appears to be more complicated then I thought, involving plug-in architectures and data feedback. I'm afraid this will be out of my reach, unless I'm fed both with code and detailed instructions, which in my view is to much to ask.
The problem is about integration. That is, how this information can be used in further processing.
But don't be afraid, you do NOT need to read other ppl's code or understand the plugin API, neither need you be able to write a plugin yourself.
You need, however, be able to describe your demands precisely - which you fairly did already

My ramblings about which information goes where, and when, were meant rather as to focus the discussion, and give you hints on which direction might be a dead-end (=buttons).
Another thing you need to be prepared for - if you want your problem solved - is to try out things and report.
Here are are two suggestions from my side:
- - get the current 8.5 beta running (it contains new functionality which might be necessary to solve your problem)
- download and install an arbitrary WDX (content) plugin and have it show its value(s) in a custom columns view, both in your current version of TC and the 8.5 beta
Sure. If by pure miracle my problems were solved - I'd definitely choose thatArtist wrote:Thats why I hoped that Christian would come in, clap on my back and say: "Sure thing! We just forgot to implement this trifling bit of code, but it will be included in the next update! Anything for one of the first 5.000 registered users!

But well, you never know...
---
Will give "Beyond Compare" a closer look (haven't yet so far).