[FEATURE REQUEST:] select duplicates in search result list!

chrizoo · Post by *chrizoo » 2008-03-29, 17:52 UTC

Pierre75, first of all, you can shorten quotes ([...]), but please don't modify them! It's not only against quoting ethics, but it's also confusing readers. For example my answer you included in your quote does not make any sense any more for the modified version of your quote. You cannot cite what has not been said, it's that easy

Please edit that.
And secondly, don't worry, I don't get hurt easily, I just wanted to make sure criticism is based on arguments.

pierre75 wrote: But I'm not talking about the path length, I'm talking about the filename length (without directories).

No, you did talk about path length:

pierre75 wrote:What the meaning of a "longest path name"? Don't you think that it's just strange ideas??

Anyway, I understand your point now regarding the dir preference. Thanks for clarifying.
Maybe you want to draw some hypothetical screenshots, like Clo did (1st page). Because often the suggestions sound fine, but then it's difficult to pack them in a GUI with a minimum of user-friendliness.

chrizoo · Post by *chrizoo » 2008-03-29, 19:04 UTC

****** THERE IS A POLL FOR THIS ISSUE --> HERE. YOU CAN NOW VOTE ON WETHER THIS PROBLEM SHOULD BE SOLVED. ******

Post by *ghisler(Author) » 2008-03-31, 15:11 UTC

The reason why I haven't implemented any of these solutions so far is that everyone has other needs for this function, so a different function would have to be written for every user. It's very risky to pick out just one of these dozens of ideas and mark files with this method (e.g. by path length) in the hope that it will be OK for everyone. For example, I have some backups in c:\backup, so the path of the backups is usually shorter than the path of the originals...

Lefteous · Post by *Lefteous » 2008-03-31, 15:18 UTC

The primary problem here is not selecting but the way the information is presented. The flat list grouped by directory doesn't help to solve remove duplicates or related tasks.

pierre75 · Post by *pierre75 » 2008-04-01, 23:47 UTC

Mr. Ghilser

As usual, thank you for your rapid feed-back.

But I didn't really understood your message. Are you expecting for everyone to ask the same thing? How could it happen? Are the user supposed to discuss on the forum and agree on an unique solution before you implement it? I don't really know, I'm quite new here.

However, I think that most people have the same need, on the contrary. While you propose a very good list of duplicated files, it takes so much time to select manually the files. That's the big problem, don't you think?

Most of the user are doing manually a task that takes hours and could be easily automatised, in humble opinion. While they select by hand the files to delete, I believe that they take some decisions, and then apply these decision to each group of files. At least, that's how I do. (Or, to be honest, as I try to do, because when there are hundreds of files, it's quite impossible to do.)

These decisions seem quite simple. When there are 4 or 5 directories containing the same file, we can choose one of them, and try to remember our decision for the following groups of duplicates. You could solve this by asking the user which is he's preferred directory, isn't it? When it's in the same directory, it's a bit harder, it's often the longest name, but indeed, it's the most meaningful name that we choose. OK, you're right, in this case, it's quite impossible to implement a 100% reliable method to choose the best filename, since the shortest could be the most meaningful.

Let's imagine that you have 2 identical pictures, on called by the place "haiti.jpg" and the other one with the date "12july2003.jpg" : both information are useful, and no program could ever guess that it should delete one file, and to rename the other one with "haiti_12july2003.jpg".

This is just an example to say that it's just impossible to write a perfect algorithm. So if we wait for a perfect algorithm, I guess we are condemned to manually select hundreds of duplicates for the years to come

The fact is that a lot of users just want a simple thing: to quickly delete duplicates, to make free space on their hard drive, without loosing time or data. (It could be the case for pictures or so, that you've copied to a portable device, and then again on your hard drive because you're not sure that you still have it, ...) For this, you've already made half of the work (quickly finding duplicates) , and the second half should be to quickly select files in the list that you show, I think.

Personally, I think that TC should not have a fully automatic functionality to delete duplicated files. In fact, TC should only help to make a selection faster in the duplicated files list. Even if I think that TC is a very reliable tool, I would never let TC delete my files without being able to have a look before at what it will delete.

That's why, once we have obtained the duplicated files list (alt-F7 > search > feed to listbox), a supplementary tool to (un)select files in that list would be incredibly helpful. Of course, the most powerful option(*) of this tool would be to ensure that there is always at least 1 file left selected (or unselected) in each group.

(*) it's an option, maybe showed as a checkbox, because some user could want to (un)select all files in a group.

Other (un)selecting options could use the date, regular expressions or TC plugins information (exif, image width, ...)

So it would not be about deleting, it would be about (un)selecting files in groups (and ensure 1 file (un)selected in each group).

Once done, we are able to fine-tune the selection manually if we want to, and then it's our responsibility to push the DEL key or not.

That's what should resolve problem of a lot of users, in my humble opinion. It doesn't involve security problems, and it's possible to implement.

Thank you

jjk · Post by *jjk » 2008-04-02, 07:05 UTC

2 pierre75
I agree your last post. Support.

Rein de Jong · Post by *Rein de Jong » 2008-04-02, 09:07 UTC

2pierre75
Support. Anything is better than nothing!

Post by *Hacker » 2008-04-02, 09:37 UTC

Just an idea for which icfu will certainly hate me, but well, who knows, someone might find it useful:
How about an internal command, that would work after a search for duplicates, and for each duplicate group, it would present a dialog like this:

Code: Select all

|-----------------------------------------------------|
| Select which file to keep                           |
| (o) haiti.jpg           | c:\My\Pictures\           |
| ( ) haiti.jpg           | c:\Pictures\Backup\       |
| ( ) 12july2003.jpg      | c:\Pictures\Haiti\        |
|                                                     |
| [x] Prefer this directory also for other duplicates |
|                                                     |
|                                   [ OK ] [ Cancel ] |
|-----------------------------------------------------|

Would it be helpful?

Roman

Samuel · Post by *Samuel » 2008-04-02, 09:59 UTC

What about make the result sortable (within the duplicates) by date, size, etc.

Also by directory, and if you do so, just show an dialog like:

Code: Select all

Directory priority list
Drag and drop the directory's in the order you want. Top directory's are preferred. (show all directory's where the duplicates are located)
| c:\My\Pictures\
| c:\Pictures\Backup\
| c:\Pictures\Haiti\

Afterwards just select all files but the first.
Isn't this a good allround solution, or did I forget something?

pierre75 · Post by *pierre75 » 2008-04-02, 18:52 UTC

Hacker

Good idea, but there is maybe a small problem however. I continue your example: if for the next duplicated files group, we have

Code: Select all

|-----------------------------------------------------|
| Select which file to keep                           |
| (o) jupiter.jpg         | c:\Pictures\Backup\       |
| ( ) 21may2083.jpg       | c:\Pictures\Haiti\        |
|                                                     |
| [x] Prefer this directory also for other duplicates |
|                                                     |
|                                   [ OK ] [ Cancel ] |
|-----------------------------------------------------|

After this, you have "c:\My\Pictures\" and "c:\Pictures\Backup\" as prefered directories. But at the same level, so TC can not decide which one you prefer. TC has to ask again:

Code: Select all

|-----------------------------------------------------|
| Select which file to keep                           |
| ( ) 18march3861.jpg     | c:\My\Pictures\           |
| ( ) 4th_dimension.jpg   | c:\Pictures\Backup\       |
|                                                     |
| [ ] Prefer this directory also for other duplicates |
|                                                     |
|                                   [ OK ] [ Cancel ] |
|-----------------------------------------------------|

If you have your duplicated files in many directories, it will make a lot of popups, I think. However, your solution has the advantage of being really "user friendly" !

Samuel

That's what I proposed in the 2 previous posts, but maybe it was not so clear, sorry

Indeed, I think exactly the same as you do. it's the most convenient solution in my humble opinion.

The only case where it could not be convenient would be if you have many duplicated files (let's say 100) in many (almost 100) different directories. But it's unlike to happen so often.

If you didn't read my 2 previous posts, I'd like you to read about the suggestion of "system directories", since it might be really useful also. What do you think?

Samuel · Post by *Samuel » 2008-04-02, 20:15 UTC

@pierre75
I read your previous post and a little of the post before. Now I reread both.

I got it to fewer lines...

The "system directories" solution is not good. No one should ever search a whole hard-disk for duplicates. You could by accident delete system files... If you search direct on the hard disk itself just search in selected media directories...

Same effect and faster search...

pierre75 · Post by *pierre75 » 2008-04-02, 21:19 UTC

Thanks! Yes, it's a better solution, indeed. It's simpler to implement, and it's faster to use. No need for the "system directories".

Samuel · Post by *Samuel » 2008-04-02, 21:24 UTC

It is implemented...

pierre75 · Post by *pierre75 » 2008-04-03, 17:25 UTC

I was talking about the selection tool, of course. It's simpler to implement without that "system directories" feature.

What about make the result sortable (within the duplicates) by date, size, etc.

Quite good.

There is no major problem, in my opinion, but I thought about 2 small problems. Please tell me what you think about this?

(1/2) A possible problem: you have hundreds of groups with various file sizes and dates: if you sort by date descending, and that 2 files in a group have the same (newest) date? It's not safe for TC to choose 1 of these 2 files.

In that case, TC should warn the user that some groups have this problem.

Then, some solutions:
- leave the 2 files unselected
- abandon this sort and sort on another attribute instead (size, ...)
- sort on 2 or more attributes (1st=date, 2nd=size, ...)

Code: Select all

|- Total Commander ------------------------------------| 
|                                                      |
| It was not possible to unselect a unique file in     |
| each group, using the "date".                        |
|                                                      |
| [x] leave more than 1 file unselected in some groups |
| [ ] sort on another criteria (size, ...)             | 
| [ ] sort on an additional criteria (size, ...)       | 
|                                                      |
|                        [OK]                          |
|------------------------------------------------------|

(2/2) A second problem: if you have all your duplicates with the same attributes (same directory, same size, same date, same content, ...) except for the name ( "154_ko_abcdef.dat" , "204_qwerty_ko_.dat" , "427_ok_azerty.dat" ). Let's say that the files with "ko" in the name are bad: how do you exclude the files with "_ko_" in the filename? It's not possible do that by sorting. In this case, it's better to have a selection/unselection tool (that un/select files in the listbox).

Also by directory, and if you do so, just show an dialog like:
Code: Select all
Directory priority list
Drag and drop the directory's in the order you want. Top directory's are preferred. (show all directory's where the duplicates are located)
| c:\My\Pictures\
| c:\Pictures\Backup\
| c:\Pictures\Haiti\
Afterwards just select all files but the first.

TC should do this automatically. (I guess that's what you meant, but it's to make things clear.)

Isn't this a good allround solution, or did I forget something?

It's better than nothing, don't you think? But conceptually, it's not enough, as it's impossible to cover some cases (things that are not sortable).

pierre75 · Post by *pierre75 » 2008-04-04, 13:23 UTC

Here are some "screens" for better understanding how it could work.

Let's say you've made a search, you have the result in the listbox (it means left or right panel):

c:\work\photo(5).jpg
c:\recyler\photo.jpg
c:\temp\photo.jpg
c:\work\photo.jpg
c:\work\image.jpg
c:\work\photo(7).jpg
c:\bak\photo.jpg
----------------------------------------------------------------
c:\temp\2.jpg
c:\temp\haiti.jpg
c:\bak\haiti.jpg
c:\bak\haiti(2).jpg
----------------------------------------------------------------
c:\work\jupiter.jpeg
c:\work\abc\jupiter.jpeg
c:\bak\jupiter.jpg
c:\temp\abc\jupiter.jpg
c:\temp\jupiter.jpg

You can now use the selection tool:

Code: Select all

|- Filters --------------------------------------------|
|                                           | [New...] |
|  My Pictures  [apply this filter]         | [Edit..] |
|  Xyz Data  [apply this filter]            | [Delete] |
|                                           |          |
|                                           |          |
|------------------------------------------------------|

If you select "My Pictures" and push on [Edit..] , then you have this dialog box:

Code: Select all

|- Filter Rules --------------------------------------------------------------|
|                                                                             |
| Filter Name: [ My Pictures                 ]                                |
|                                                                             |
| In each duplicated files group, select all files except files where:        |
| |-------------------------------------------------------------------------| |
| |                                                                         | |
| | [ date               [v]]  [ is newest       [v]]  [               [v]] | |
| | [ size               [v]]  [ is smallest     [v]]  [               [v]] | |
| | [ imgsize.width      [v]]  [ is maximum      [v]]  [               [v]] | |
| | [ imgsize.height     [v]]  [ is smaller than [v]]  [ 600           [v]] | |
| | [ filename           [v]]  [ match regexp    [v]]  [ ^[^(]+jpg$    [v]] | |
| | [ directory          [v]]  [ is preferred    [v]]  [ ask each time [v]] | |
| | [ <choose attribute> [v]]  [ <choose test>   [v]]  [               [v]] | |
| |                                                                         | |
| |                                                                         | |
| |-------------------------------------------------------------------------| |
|    Currently selected rule :   [ move up ]   [ move down ]   [ remove ]     |
|                                                                             |
| [x] Always keep at least one file unselected in each group                  |
| [x] Warn me if there are groups with more than one unselected file          |
|                                                                             |
|                                                       [  OK  ]   [Cancel]   |
|-----------------------------------------------------------------------------|

The rules apply in order (from top to bottom). Each rule select files that don't fit (e.g. 1st rule selects all files that have a date smaller than the newest file in the group, it also means that if you have 1 file of "2007/08/02 16:12:05" and 2 files of "2008/04/03 18:11:25" then you will have 2 files unselected). When a rule select all files in a group, TC don't apply it, so always leaving at least 1 file unselected in each group (e.g. both files of previous example are pictures with height=1024, then the 4th rule will wants to select the 2 remaining files leaving no file unselected in the group. TC stops at 3rd rule, never applying following rules, so that there will always be 1 or more file unselected in that group.)

Note: for the "directory" filter, it should be nice to have also the possibility to have a set of prefered directories like "c:\work|c:\bak|c:\temp|c:\recyl..."

Note 2: to become a really powerful tool, there could be a lot more rules:
- directory -> longest|shortest name
- file -> longest|shortest name
- file content -> regexp search
- compressed file -> contained file, or even contained file's content
- attribut -> hidden|archive|system|read only|encrypted
- accesstime -> newest|oldest

Note 3: there could be a checkbox to say "skip rules that select all files in a group, and continue applying following rules until the end of the rules list"

Not all of these notes are important, and even not mandatory. It would be great and powerful features, but in a first version of this "selection tool", it's OK to have only basic selection rules.

Back to the previous dialog box:

Code: Select all

|- Filters --------------------------------------------|
|                                           | [New...] |
|  My Pictures  [apply this filter]         | [Edit..] |
|  Xyz Data  [apply this filter]            | [Delete] |
|                                           |          |
|                                           |          |
|------------------------------------------------------|

If you push on [apply this filter] next to "My Pictures", the corresponding filter applies... TC has to ask you for your preferred directories (as each time, since it depends on the directories found in the listbox content):

Code: Select all

|- Directories contining duplicates ------|
|                         |               |
| Preference order:       |               |
|                         |               |
| c:\bak                  | [ move up ]   |
| c:\recyler              | [ move down ] |
| c:\temp                 |               |
| c:\temp\abc\            |               |
| c:\work                 |               |
| c:\work\abc\            |               |
|                         |        [ OK ] |
|-----------------------------------------|

You could decide to order them like this:

Code: Select all

|- Directories contining duplicates ------|
|                         |               |
| Preference order:       |               |
|                         |               |
| c:\work                 | [ move up ]   |
| c:\work\abc\            | [ move down ] |
| c:\bak                  |               |
| c:\temp                 |               |
| c:\temp\abc\            |               |
| c:\recyler              |               |
|                         |        [ OK ] |
|-----------------------------------------|

Now, TC is ready to make the selection, you have the selection result in the listbox:

c:\work\photo(5).jpg
c:\recyler\photo.jpg
c:\temp\photo.jpg
c:\work\photo.jpg
c:\work\image.jpg
c:\work\photo(7).jpg
c:\bak\photo.jpg
----------------------------------------------------------------
c:\temp\2.jpg
c:\temp\haiti.jpg
c:\bak\haiti.jpg
c:\bak\haiti(2).jpg
----------------------------------------------------------------
c:\work\jupiter.jpeg
c:\work\abc\jupiter.jpeg
c:\bak\jupiter.jpg
c:\temp\abc\jupiter.jpg
c:\temp\jupiter.jpg

In our example, there is a warning message, because we asked to get a warning message, and because none of the rules could make a choice between "c:\work\photo.jpg" and "c:\work\image.jpg".

It's done!

You're now free to refine the selection manually, and to take an action (delete, multi-rename, ...) on the selected files!

I think that's the most powerful solution, but it may be quite time-consuming for Mr Ghisler to implement. However, if this solution is accepted, the tool could offer only simplest rules in a 1st version, this should be far less time-consuming. Any better suggestion?

[FEATURE REQUEST:] select duplicates in search result list!

screen preview