Duplicate finder question.

Discuss and announce Total Commander plugins, addons and other useful tools here, both their usage and their development.

Moderators: white, Hacker, petermad, Stefan2

Post Reply
User avatar
nsp
Power Member
Power Member
Posts: 1802
Joined: 2005-12-04, 08:39 UTC
Location: Lyon (FRANCE)
Contact:

Duplicate finder question.

Post by *nsp »

@ghisler(Author)
I have a small question about duplicate find mechanism.

If i tick size and content (content only should also group by size first and process from group).

Do you group by size first and then compare by content all grouped files using an incremental mechanism in parallel ?
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48012
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Re: Duplicate finder question.

Post by *ghisler(Author) »

When you tick content, TC does group by size - even when size is not checked, because you can't have same content when the size is different.

If there are only 2 files, TC compares them directly. If there are more than 2 files, TC generates MD5 hashes of all files with equal size, and then compares the hashes. This is necessary because if there are 4 or more files, they may be matching pairwise or in multiple groups.
Author of Total Commander
https://www.ghisler.com
User avatar
nsp
Power Member
Power Member
Posts: 1802
Joined: 2005-12-04, 08:39 UTC
Location: Lyon (FRANCE)
Contact:

Re: Duplicate finder question.

Post by *nsp »

Many thanks for this point, i made a comparison over network and it was very slow to determine that many (12) big files had same size different contents even from the 100 first bytes.. This is why i was asking question about any specific heuristic like opening all files of same group and calculate a hash dynamically by small block to detect or separate in new sub group as soon as possible without reading complete files. (this occurs in any case when file are indentical)
The current way is OK on fast drive or small files but on network or with huge files it is faster to use fclones or jdupes (direcly on sever ;)).
Post Reply