[FEATURE REQUEST:] select duplicates in search result list!
Moderators: Hacker, petermad, Stefan2, white
OK, here we go.chrizoo wrote:I would have been incredibly thankful for ANY solution

My previous Shift-F2-suggestion is of course not usable here, sorry.
Instead I wrote a batch file for this (at least NT4 needed) - not perfect, but maybe better than nothing:
download from here (tested only on my own system (W2k SP4), use it at your own risk).
It will leave only the first of every block of file names (which might me different from what TC displays as a block) -
this is a serious limitation, because if you have files with e.g. different content/size but same name, this batch only sees the same name and deletes all "copies" but the first.
Definition for TC's Start Menu or a button etc. (maybe the path to the command is needed, too, depending on where you store it):
Code: Select all
Command : TC_AutoDeleteDupes.cmd
Parameters: %L
- search duplicates
- use "Feed to listbox"
- mark all files
- use the created menu entry / button for TC_AutoDeleteDupes.cmd
To really delete the dupes you have to remove the "rem " in line 44:
it should then readif /i '%tcfn_last%'=='%tcfn_act%' echo del %tcfn_delpath%&rem del %tcfn_delpath%
If you don't like pausing at the end, you can eliminate this by adding "rem " before the "pause" command in line 50 or by deleting the whole line:if /i '%tcfn_last%'=='%tcfn_act%' echo del %tcfn_delpath%&del %tcfn_delpath%
Code: Select all
pause
Who the hell is General Failure, and why is he reading my disk?
-- TC starter menu: Fast yet descriptive command access!
-- TC starter menu: Fast yet descriptive command access!
- ghisler(Author)
- Site Admin
- Posts: 50824
- Joined: 2003-02-04, 09:46 UTC
- Location: Switzerland
- Contact:
I didn't reply because I find it quite risky to auto-select some of the dupes. For example, there may be files in a main directory and a backup directory. In this case, the dupes in the backup dir should be deleted, not in the main dir. Total Commander has a special internal command for this:
cm_SelectCurrentPath
It selects all files which are in the same directory as the file under the cursor. Of course this covers only one possible situation...
cm_SelectCurrentPath
It selects all files which are in the same directory as the file under the cursor. Of course this covers only one possible situation...
Author of Total Commander
https://www.ghisler.com
https://www.ghisler.com
Mr. Ghisler,
thanks a lot for commenting on that issue here.
Is there nothing else you want to say concerning all the points in my lengthy post on the first page or the various other comments from other users, for example the excellent suggestion of Claude (one of your Moderators, right?) ?
We especially dealt with the aspect of risk.
Besides, many steps can be taken to make sure the user makes deliberate choices (warnings, option settings, etc.).
cm_SelectCurrentPath does not bring us any further with that issue, if I'm not mistaken, because even without that command, the user can sort the results by name, which actually results in a "sort by path and then per name" action, so it's quite easy to select any specific directory as a whole.
Anyway, thanks for your great program. By way of the many threads (I only cited a couple of them in my post) and by the mere fact that a lot of people try to work around that issue with scripts,etc. you can see that this is a problem many users encounter and cannot solve without your help. I sincerely hope that you will (re)consider the points raised in this and all the other similar threads which offer very concrete solutions with screenshots, etc.
PS: Maybe we should not call it "auto-select", it sounds indeed dangerous, and I did it myself I think, but in fact nothing is "auto-selected". After displaying the results, nothing should be selected of course in the first place. And then, all TC should do is assist the user in deciding for himself which files to select in the best possible way and that is offering as many criteria for file selection as possible (date, length of name/path, etc.) - instead of letting the user click through thousands of files one by one ... I guess this is not really a desirable status quo.
thanks a lot for commenting on that issue here.
Is there nothing else you want to say concerning all the points in my lengthy post on the first page or the various other comments from other users, for example the excellent suggestion of Claude (one of your Moderators, right?) ?
We especially dealt with the aspect of risk.
Besides, many steps can be taken to make sure the user makes deliberate choices (warnings, option settings, etc.).
cm_SelectCurrentPath does not bring us any further with that issue, if I'm not mistaken, because even without that command, the user can sort the results by name, which actually results in a "sort by path and then per name" action, so it's quite easy to select any specific directory as a whole.
Anyway, thanks for your great program. By way of the many threads (I only cited a couple of them in my post) and by the mere fact that a lot of people try to work around that issue with scripts,etc. you can see that this is a problem many users encounter and cannot solve without your help. I sincerely hope that you will (re)consider the points raised in this and all the other similar threads which offer very concrete solutions with screenshots, etc.
PS: Maybe we should not call it "auto-select", it sounds indeed dangerous, and I did it myself I think, but in fact nothing is "auto-selected". After displaying the results, nothing should be selected of course in the first place. And then, all TC should do is assist the user in deciding for himself which files to select in the best possible way and that is offering as many criteria for file selection as possible (date, length of name/path, etc.) - instead of letting the user click through thousands of files one by one ... I guess this is not really a desirable status quo.
Last edited by chrizoo on 2008-03-29, 16:38 UTC, edited 2 times in total.
Status Quo, thanks! I haven't commented yet, I will ... promised 
EDIT:
So I'm a bit embarrassed when I am neither able to return the favour nor accord this gift the treatment it would truly deserve, but for one I most of the time compare files with same content but different name and, additionally, I am ashamed to say that I am afraid to use a method that is not based upon what "TC displays as a block" ...
When you said your code works with blocks of files "which might me different from what TC displays as a block" (quote), did you mean that as a general statement, or is your assertion limited to those cases where you look for duplicates based on content/size AND DOES NOT APPLY to those cases where duplicates are identified by filename only and thus your code is SAFE to use for the latter ?

EDIT:
StatusQuo, I'm not quite sure how to say this in such manner that it will adequately pay tribute to the support you have given me (and the other users here) with your batch file. I want to underline that I am really thankful and touched by the fact that you (like so many of you guys here) take your time and - for mere altruistic reasons - code some lines for a person you don't even know, some millions of miles away ...StatusQuo wrote:...tested only on my own system (W2k SP4), use it at your own risk).
It will leave only the first of every block of file names (which might me different from what TC displays as a block) -
this is a serious limitation, because if you have files with e.g. different content/size but same name, this batch only sees the same name and deletes all "copies" but the first.
So I'm a bit embarrassed when I am neither able to return the favour nor accord this gift the treatment it would truly deserve, but for one I most of the time compare files with same content but different name and, additionally, I am ashamed to say that I am afraid to use a method that is not based upon what "TC displays as a block" ...
When you said your code works with blocks of files "which might me different from what TC displays as a block" (quote), did you mean that as a general statement, or is your assertion limited to those cases where you look for duplicates based on content/size AND DOES NOT APPLY to those cases where duplicates are identified by filename only and thus your code is SAFE to use for the latter ?
Last edited by chrizoo on 2008-03-25, 19:01 UTC, edited 3 times in total.
-
- Junior Member
- Posts: 8
- Joined: 2008-03-15, 16:03 UTC
- Location: Deux-Montagnes QC.
This is the workaround that I use
First add a prefix of &&&&& to The Directories that you want to protect. (To give them visability)
Do your search (always use an extension mask) set Duplicates to same size same contents
Be carefull not to select the previous directory marker
select feed to listbox
In listbox click on the name column
tc sorts on the directory pigtale
now you can delete all entries that don't have &&&&& as a directory
rename your protected directories back to what they were
First add a prefix of &&&&& to The Directories that you want to protect. (To give them visability)
Do your search (always use an extension mask) set Duplicates to same size same contents
Be carefull not to select the previous directory marker
select feed to listbox
In listbox click on the name column
tc sorts on the directory pigtale
now you can delete all entries that don't have &&&&& as a directory
rename your protected directories back to what they were
Rick, thanks a lot for your help! Great idea, hadn't thought about that yet!
What do you mean with "be careful not to select the previous directory marker" ?
This will definitely help me in some cases! So it's definitely a step forward.
--------------
Downside: One has to really pay a hell of attention here, because, let's remember, the target - as defined in my OP - was to delete all dupes but keep (exactly) one file out of a group of duplicate files. Now, if all the groups have at least one occurence in "&&&&&directory_to_protect", that works. But if a a group of indentical files ONLY exist in OTHER dirs, without one file of the group being in "&&&&&directory_to_protect", than you would actually select all of them and thus delete the whole group without keeping one (unique) file. So you would LOOSE some of your files !!
What do you mean with "be careful not to select the previous directory marker" ?
This will definitely help me in some cases! So it's definitely a step forward.
--------------
Downside: One has to really pay a hell of attention here, because, let's remember, the target - as defined in my OP - was to delete all dupes but keep (exactly) one file out of a group of duplicate files. Now, if all the groups have at least one occurence in "&&&&&directory_to_protect", that works. But if a a group of indentical files ONLY exist in OTHER dirs, without one file of the group being in "&&&&&directory_to_protect", than you would actually select all of them and thus delete the whole group without keeping one (unique) file. So you would LOOSE some of your files !!
Mr. Ghisler,
in addition to my words to you here, ... I hope you see that by not dealing with an issue out of security concerns, we actually stumble into a situation which is intrinsically more dangerous !
I honestly don't understand where you see any security gains by keeping the status quo. Consider this:
* Users who don't need that function won't use it anyway, so if TC provides a dupe-select function (especially if security measures are used inside TC, see below), the risk is not getting any bigger for these users!
* But those who really need to mass-select dupes will do it anyway. They will search in this forum and might end up using a flawed vbs script or use the batch file posted above and unwittingly delete files, because in the heat of the moment they missed the fact it only works correctly for identically named files. Or they use the advice above, rename their dirs to &&&&&dir_to_keep and wonder why some files are gone ....
If TC provides a solution, for example the one Clo (Moderator) has suggested, those users would be definitely LESS LIKELY.
CONCLUSION: THE RISK OF NOT ADDRESSING THE ISSUE IS GREATER THAN KEEPING THE STATUS QUO!
Why couldn't Clo's solution be implemented in TC but have a checkbox in the configuration options that includes a warning and which must be ticked beforehand in order to be able to use that function ??
Don't forget, you adopted this solution for other potentially risky function as well ... remeber your copy method in the options ? It says: "copy method (for experts only!) ... and it goes on saying: "standard method (reliable, not very fast)" ....
Why offering a reliable choice plus a second choice in the first place if you really wish to treat security like the one and only point to consider, something like the holy grail ?
Ironically, by rejecting all the other points, you achieve the opposite of what you want and you end up having a more dangerous situation than if you accept one of the many suggestions that were made by various posters here.
In a bizarre way, it's like tourists falling down a cliff which is not secured by a fence, because the local government is quarrelling over wether the fence should be made of wood or of stone...
in addition to my words to you here, ... I hope you see that by not dealing with an issue out of security concerns, we actually stumble into a situation which is intrinsically more dangerous !
I honestly don't understand where you see any security gains by keeping the status quo. Consider this:
* Users who don't need that function won't use it anyway, so if TC provides a dupe-select function (especially if security measures are used inside TC, see below), the risk is not getting any bigger for these users!
* But those who really need to mass-select dupes will do it anyway. They will search in this forum and might end up using a flawed vbs script or use the batch file posted above and unwittingly delete files, because in the heat of the moment they missed the fact it only works correctly for identically named files. Or they use the advice above, rename their dirs to &&&&&dir_to_keep and wonder why some files are gone ....
If TC provides a solution, for example the one Clo (Moderator) has suggested, those users would be definitely LESS LIKELY.
CONCLUSION: THE RISK OF NOT ADDRESSING THE ISSUE IS GREATER THAN KEEPING THE STATUS QUO!
Why couldn't Clo's solution be implemented in TC but have a checkbox in the configuration options that includes a warning and which must be ticked beforehand in order to be able to use that function ??
Don't forget, you adopted this solution for other potentially risky function as well ... remeber your copy method in the options ? It says: "copy method (for experts only!) ... and it goes on saying: "standard method (reliable, not very fast)" ....
Why offering a reliable choice plus a second choice in the first place if you really wish to treat security like the one and only point to consider, something like the holy grail ?
Ironically, by rejecting all the other points, you achieve the opposite of what you want and you end up having a more dangerous situation than if you accept one of the many suggestions that were made by various posters here.
In a bizarre way, it's like tourists falling down a cliff which is not secured by a fence, because the local government is quarrelling over wether the fence should be made of wood or of stone...
Last edited by chrizoo on 2008-03-29, 16:19 UTC, edited 1 time in total.
-
- Junior Member
- Posts: 8
- Joined: 2008-03-15, 16:03 UTC
- Location: Deux-Montagnes QC.
2chrizoo

The above batch relies on TC's search results and identifies duplicate files following each other in this list by their name only.
E.g. 3 programs, each with different INI files plus an identical backup:
TC_AutoDeleteDupes.cmd currently can't see TC's block separators and also doesn't know the file contents.
In the above case it sees 6 identical file names in TC's duplicate list - so all of those but the first (c:\prg1\settings.ini) would be deleted.
This may not occur in collections with unique file names, but the above scenario is not so unlikely - it just depends on your file list.
I could probably extend the batch to also/instead compare each file content - but that would mean reading the complete file contents a second time (at least it would only be those files, TC already found).
Probably it's best to drop the current method (comparing by names only) and additionally implement re-reading/comparing the contents of each file in TC's list. Although this could get problematic when other duplicate search options are set in TC.
I think I'll take another look at this to see what more is possible...
Of course a TC-internal solution would work much more efficiently, having all needed information already there...
Additionally: automated selecting is always safer than automated deleting...
Alright, the batch is not useful for you yet, I got it.I'm not quite sure how to say this

Not blocks of files, but blocks of file names.When you said your code works with blocks of files "which might me different from what TC displays as a block" (quote)
The above batch relies on TC's search results and identifies duplicate files following each other in this list by their name only.
NO, definetely NOT in every case.those cases where duplicates are identified by filename only and thus your code is SAFE to use for the latter ?
E.g. 3 programs, each with different INI files plus an identical backup:
Code: Select all
c:\prg1\settings.ini
c:\prg1\bak\settings.ini
---
c:\prg2\settings.ini
c:\prg2\bak\settings.ini
---
c:\prg3\settings.ini
c:\prg3\bak\settings.ini
In the above case it sees 6 identical file names in TC's duplicate list - so all of those but the first (c:\prg1\settings.ini) would be deleted.
This may not occur in collections with unique file names, but the above scenario is not so unlikely - it just depends on your file list.
I could probably extend the batch to also/instead compare each file content - but that would mean reading the complete file contents a second time (at least it would only be those files, TC already found).
Probably it's best to drop the current method (comparing by names only) and additionally implement re-reading/comparing the contents of each file in TC's list. Although this could get problematic when other duplicate search options are set in TC.

I think I'll take another look at this to see what more is possible...
Of course a TC-internal solution would work much more efficiently, having all needed information already there...
Additionally: automated selecting is always safer than automated deleting...
Who the hell is General Failure, and why is he reading my disk?
-- TC starter menu: Fast yet descriptive command access!
-- TC starter menu: Fast yet descriptive command access!
I updated TC_AutoDeleteDupes.cmd to Version 0.2.
Now it checks both file names and file contents of each list entry.
Using it with this basic function should be safe:
if the used search parameters in TC were less strict, this batch simply deletes less dupes than TC found.
The basic function should correspond to these parameters in TC's dupe search:
To also delete dupes with different file names you can specify the new optional parameter /IgnoreNames (without the brackets):
By default the downloadable version still deletes nothing, but only displays what it would delete.
To activate real deleting after you successfully checked that it works for you: remove the "rem " from line 83 (above the ":Help" section).
2chrizoo
Does that meet your requirements?
Now it checks both file names and file contents of each list entry.
Using it with this basic function should be safe:
if the used search parameters in TC were less strict, this batch simply deletes less dupes than TC found.
The basic function should correspond to these parameters in TC's dupe search:
Code: Select all
[X] same name [ ] same size [X] same contents
Using /IgnoreNames its function should correspond to these parameters in TC's dupe search:TC_AutoDeleteDupes.cmd "d:\path\filelist.txt" [/IgnoreNames]
Code: Select all
[ ] same name [ ] same size [X] same contents
To activate real deleting after you successfully checked that it works for you: remove the "rem " from line 83 (above the ":Help" section).
2chrizoo
Does that meet your requirements?
Who the hell is General Failure, and why is he reading my disk?
-- TC starter menu: Fast yet descriptive command access!
-- TC starter menu: Fast yet descriptive command access!
I have the same problem, as hundreds of users of Total Commander, I guess. It's because it's such a great piece of software, we're expecting even more of it. 
For the proposal that was made, I'm a bit surprised of the idea. What the meaning of a "longest path name"? Don't you think that it's just strange ideas??
Let's take thislist of duplicates as example:
OK, this works only for files in different directories, but not for files in the same directory. For this matter, you could indeed prefer the longest filename, as the shortest filename could be meaningless. This should be a "second rule". Another criteria could be to remove all filenames matching a certain regular expression (for example all filnames containing spaces). You could also invent other rules, based on the date-time or even on the TC plugin's datas (EXIF, image width, ...)
By combining all these rules, Total Commander can now decide which is the "good" file and which is the "bad" file.
Example 1:
The only thing that Total Commander has to make sure is that there is ALWAYS one file left unselected in each group.
So, when you have the list with selected files, you can have a look, modify the selection before deleting, .... or you can cancel and refine your criteria (path order, filename length, ...)
One more idea:
There are some directories that need to be complete in order to work. For examples, if you're writing a program, there are some includes (for example in Java, some .jar files) that need to be there, even if it's duplicated. It also the same with some .dll files in windows, some pictures, ... So the idea is to consider some directories as "system directories". For this, there could be a checkbox in the dialog box (as above):
And I must add that I've been waiting for this features for years now. It's only now that I've decided to search for a solution ... and again, I did not found it
Hoping to see it soon in Total Commander !

For the proposal that was made, I'm a bit surprised of the idea. What the meaning of a "longest path name"? Don't you think that it's just strange ideas??

Let's take thislist of duplicates as example:
Then, TC can create a list of directories (containing the duplicates):c:\prg1\settings.ini
c:\prg1\bak\settings.ini
c:\prg1\bak\settings.ini.bak
c:\prg1\bak\settings.ini.bak2
---
c:\prg2\settings.ini
c:\prg2\bak\settings.ini
---
c:\prg3\settings.ini
c:\prg3\bak\settings.ini
c:\prg3\bak\settings.bak
Total Commander should just ask to "please order the directories from most important to less important". In our example, the user uses a "dialog box" to order the directories (he wants to put the 3 backup directories at bottom of the list, of course):c:\prg1\
c:\prg1\bak\
c:\prg2\
c:\prg2\bak\
c:\prg3\
c:\prg3\bak\
Now, Total commander knows how to make a choice. (update:) For each group of duplicate, TC will keep the file that is in the directory has the higest preference. See next post for an example. (end of update)c:\prg1\ (most important)
c:\prg2\
c:\prg3\
c:\prg1\bak\
c:\prg2\bak\
c:\prg3\bak\ (less important)
OK, this works only for files in different directories, but not for files in the same directory. For this matter, you could indeed prefer the longest filename, as the shortest filename could be meaningless. This should be a "second rule". Another criteria could be to remove all filenames matching a certain regular expression (for example all filnames containing spaces). You could also invent other rules, based on the date-time or even on the TC plugin's datas (EXIF, image width, ...)
By combining all these rules, Total Commander can now decide which is the "good" file and which is the "bad" file.
Example 1:
Example 2: (you could inverse rules order)rule 1 : directories order [modify here ...]
rule 2 : file date
rule 3 : filename length
But all these rules would have only one purpose: to make a selection of all files except one file per group in the result list.rule 1 : image EXIF property
rule 2 : directories order [modify here ...]
rule 3 : filename date
After this, you are free to push on "DELETE" or not. It's up to you.c:\prg1\settings.ini
c:\prg1\bak\settings.ini
c:\prg1\bak\settings.ini.bak
c:\prg1\bak\settings.ini.bak2
---
c:\prg2\settings.ini
c:\prg2\bak\settings.ini
---
c:\prg3\settings.ini
c:\prg3\bak\settings.ini
c:\prg3\bak\settings.bak
The only thing that Total Commander has to make sure is that there is ALWAYS one file left unselected in each group.
So, when you have the list with selected files, you can have a look, modify the selection before deleting, .... or you can cancel and refine your criteria (path order, filename length, ...)
One more idea:
There are some directories that need to be complete in order to work. For examples, if you're writing a program, there are some includes (for example in Java, some .jar files) that need to be there, even if it's duplicated. It also the same with some .dll files in windows, some pictures, ... So the idea is to consider some directories as "system directories". For this, there could be a checkbox in the dialog box (as above):
It's just an idea, at least it's like that I would like it to work[x] c:\program files\
[ ] d:\media_files\
[x] d:\exe\

And I must add that I've been waiting for this features for years now. It's only now that I've decided to search for a solution ... and again, I did not found it

Hoping to see it soon in Total Commander !

Last edited by pierre75 on 2008-03-29, 19:39 UTC, edited 2 times in total.


Yes this is an issue that's in dire need of a solution.pierre75 wrote:I have the same problem, as hundreds of users of Total Commander, I guess. [...]
And I must add that I've been waiting for this features for years now. It's only now that I've decided to search for a solution ... and again, I did not found it
Hoping to see it soon in Total Commander !
chrizoo wrote:when you search for "duplicates", the board is virtually packed with discussions and feature requests. I cannot quite understand why so many people raise this shortcoming and provide answers/solutions which don't get implemented:
http://www.ghisler.ch/board/viewtopic.php?t=4637
http://pagesperso-orange.fr/charries/relais/keepcopy.png
http://www.ghisler.ch/board/viewtopic.php?t=16385
http://www.ghisler.ch/board/viewtopic.php?t=15803
http://www.ghisler.ch/board/viewtopic.php?t=5453
http://www.ghisler.ch/board/viewtopic.php?t=10408
http://www.ghisler.ch/board/viewtopic.php?t=8724
... just to mention a few of them ....
Do you really think that if we consider it strange ideas, we'd put them on the table and advocate them ?? Just because it is not helpful for your does not automatically mean it is not helpful for anybody else ... Criticism is good, but some arguments to bolster the criticism is even better! And just in case you missed it : look at your very last quote. What is selected there? No wait ... can it be ... ? It's the longest paths ...pierre75 wrote:For the proposal that was made, I'm a bit surprised of the idea. What the meaning of a "longest path name"? Don't you think that it's just strange ideas??
How do you define "important" ? Do you mean it should always work through a dialog box?pierre75 wrote: "order the directories from most important to less important".
Now how exactly does TC know how to make a choice? Just select the the upper half? And if it's an odd number?pierre75 wrote:In our example, the user uses a "dialog box" to order the directories.
c:\prg1\
c:\prg2\
c:\prg3\
c:\prg1\bak\
c:\prg2\bak\
c:\prg3\bak\
Now, Total commander knows how to make a choice.
And - unless I'm wrong - you are missing a very important point here!!
Imagine most of your files are duplicates, i.e. a pair of two files with one sitting in the "prgX" dir and one in the "prgX\bak" dir. Now imagine that some files only exist in "prg2\bak" and nowhere else. This dir would get SELECTED with your method and you would loose all of these files (since there is no duplicate in another dir, as I just mentioned).
if A is relevant than ¬A (¬ means "contrary of" in maths) always is relevant as well! Remember you can always inverse your selection.pierre75 wrote: you could indeed prefer the longest filename, as the shortest filename could be meaningless.
Ok,multiple rules .... very good idea! The normal way of defining in which way one rule takes precedence over the others is: Apply rule 1 and if rule 1 is not applicable than apply rule 2 and if rule 2 is not applicable than apply rule 3. I guess that's what you meant, right ?pierre75 wrote: This should be a "second rule".
.That already exists. Search, then feed to listbox,press "FN" and "+", then click on "define" and then define a search with the regexp checkbox ticked.pierre75 wrote: Another criteria could be to remove all filenames matching a certain regular expression
This also is already possible. Same steps as mentioned above, but instead of the regexp, you go to the plugins tab.pierre75 wrote: You could also invent other rules, based on the date-time or even on the TC plugin's datas (EXIF, image width, ...)
But I guess what is still new in your suggestion is that this should be part of the "selection rules" and thus provide for a more sophisticated selection. A very good idea in my opinion !
Now I think this is the best idea anyone has ever had about this whole issue !!! Fantastic.pierre75 wrote: The only thing that Total Commander has to make sure is that there is ALWAYS one file left unselected in each group.
I would imagine it like this: You select whatever you want to select and at the moment when you click on the delete button (while you are still in the view with the grouped results, separated by dotted lines), TC checks if there is at least one file unselected in each group and if that's not the case, TC warns you with a pop-up dialog with a warning message requiring you to confirm the deletion with yes or no.
This - in addition to the dupe-selection being subject to activation in the preferences for advanced users - will make the whole thing very very safe.
That's also a great idea! Once the selection is made, the user can browse through the list of selected and unselected files and if he deems that corrections are necessary he can go back to the "selection rules". Great stuff.pierre75 wrote: So, when you have the list with selected files, you can have a look, modify the selection before deleting, .... or you can cancel and refine your criteria (path order, filename length, ...)
All these are good solutions and therefore ...
chrizoo wrote:The simple answer is that TC - or the computer for that matter - cannot know which duplicates you want to keep and which not. It's entirely of the user's discretion. The only thing TC can do is to offer the user as many choices as possible to automate this task (regarding the date of files, path length, file name length, etc.). [...] The bottom line is that until such time as PCs can read the user's mind, why not stick to the most logic thing to do and adopt the second best solution (for which proposals have been made countless times) ?
... therefore I urge everyone here to come up with suggestions as to how this issue can be tackled best. It depends on us ... the more powerful and functional our solutions are, the more likely it is, that Mr. Ghisler will lend an ear ...
!
... doesn't seem so for the time being.X-Byte wrote:Very good suggestions from Claude/chrizoo/Sheepdog.
Maybe this thread has a chance to draw Christian Ghisler's attention towards this nice feature suggestion this time...
And judging from our past experience, I wouldn't say that chances are high that Mr. Ghisler will adopt any solution ...
Already back in July 2004 icfu said:
.... 2010 ... we are approaching ...icfu wrote:Looks nice but I would prefer a simple solution first to prevent that we will have to wait till 2010:
Ghisler: Good idea, I will take it on my wish list (aka the bin?)...
Sorry...
Icfu

Sorry, I didn't meant to hurt youchrizoo wrote:What is selected there? No wait ... can it be ... ? It's the longest paths ...

But I'm not talking about the path length, I'm talking about the filename length (without directories).
Imagine that you have:
c:\work\file.dat
c:\temp\file.dat
Which one will you choose? The length is the same.
What do you think when you are selecting/unselecting files manually? You probably think "this is my work directory, this is an useless directory, ..." This should become the rules that TC helps you to automate.
Note: one more rule, for "lazy" people could be to choose the filename alphabetical order, there is ALWAYS an order, so it's possible. (But "path length" is not always different, so it's not always possible.)
Exactly.chrizoo wrote:How do you define "important" ? Do you mean it should always work through a dialog box?pierre75 wrote: "order the directories from most important to less important".
TC could remember the decisions in a file, but that's another point.
I've modified the example, as it was maybe confusing (see quote here above). It has nothing to do with an upper half.chrizoo wrote:Now how exactly does TC know how to make a choice? Just select the the upper half? And if it's an odd number?
In our example, the user uses a "dialog box" to order the directories.
c:\prg2\
c:\prg1\
c:\bak\
Now, Total commander knows how to make a choice.
Imagine that TC find these duplicated files:
c:\prg1\file.dat
c:\prg2\file.dat
c:\bak\file.dat
TC must keep 1 and only 1 file (it was the idea, at the end of my previous post). So which one to unselect? First, the one in "bak" (c:\bak\file.dat) since c:\bak\ is at the bottom of the prefered directories list.
Then, we have this shorter list:
c:\prg1\file.dat
c:\prg2\file.dat
TC has to choose again. Again, TC will unselect the lowest one "c:\prg1\file.dat" , since "c:\prg1" is under "c:\prg2" in the preference dialog box.
Again, we have this shorter list:
c:\prg2\file.dat
TC has to stop, since it's the latest file (TC must keep 1 and only 1 file).
Done. TC process next group.
The response is above.chrizoo wrote:And - unless I'm wrong - you are missing a very important point here!!
Imagine most of your files are duplicates, i.e. a pair of two files with one sitting in the "prgX" dir and one in the "prgX\bak" dir. Now imagine that some files only exist in "prg2\bak" and nowhere else.
Notes: (just to make sure that I was clear)
Of course, you have to consider directories as "compete path". And it's not recursive (not process all the underlying directories). If TC propose the directory in the list, it's because it found a duplicated file in that directory.
So:
"c:\prgX"
"c:\prgX\bak"
are 2 distinct directories (forget about the fact that one is child of the other, it would be the same with "c:\prgX" and "c:\prgXbak" ).
Thankschrizoo wrote:if A is relevant than ¬A (¬ means "contrary of" in maths) always is relevant as well! Remember you can always inverse your selection.

I will keep it like that, it's easier for me to explain, I can write shortest sentences by avoiding to inverse everything, that was the reason

Yes. TC would apply rules, from 1st to last, until there is only 1 file.chrizoo wrote: Ok,multiple rules .... very good idea! The normal way of defining in which way one rule takes precedence over the others is: Apply rule 1 and if rule 1 is not applicable than apply rule 2 and if rule 2 is not applicable than apply rule 3. I guess that's what you meant, right ?
In ideal case. Because if all your rules fail (for example, same "path length" and "same date") then TC cannot decide which one to unselect. So in fact, there will be 1 "or more" files if the rules you've set did fail. TC could pop-up a warning box in that case. Or you could search again for duplicates, but a warning pop-up would be easier.
One more suggestion/idea:
There are some directories that need to be complete in order to work. For examples, if you're writing a program, there are some includes (for example in Java, some .jar files) that need to be there, even if it's duplicated. It also the same with some .dll files in windows, some pictures, ... So the idea is to consider some directories as "system directories". For this, there could be a checkbox in the dialog box (as above):
The files in these "system directories" should never be unselected, even if there are many duplicates (but every duplicate that is in a "non-system directories" should be unselected, of course).[x] c:\program files\
[_] d:\media_files\
[x] d:\exe\
If we have:
c:\program files\file.dat
d:\exe\file.dat
d:\media_files\file.dat
We want only to delete this file (because it's not in a directory checked as "system"):
d:\media_files\file.dat
I hope these explanations were clear and concise enough

Anyway, these are just some ideas ... Christian Ghisler is certainly really busy with a lot of project, considering the number of features that are already in Total Commander!!
Last edited by pierre75 on 2008-03-29, 19:49 UTC, edited 12 times in total.