Replace duplicates with hard-links

Here you can propose new features, make suggestions etc.

Moderators: white, Hacker, petermad, Stefan2

Post Reply
gskoczylas
Junior Member
Junior Member
Posts: 9
Joined: 2008-06-19, 21:51 UTC

Replace duplicates with hard-links

Post by *gskoczylas »

I have duplicated files on my hard disk (files with the same contents). Total Commander have nice function to locate duplicate files. Then I can select files to remove from disk. But I do not want to remove them. I want to replace all selected duplicate files with their hard-links. :idea:

As far as I know, it is not possible using current verision of the Total Commander.
User avatar
MVV
Power Member
Power Member
Posts: 8702
Joined: 2008-08-03, 12:51 UTC
Location: Russian Federation

Post by *MVV »

Yes, it is not possible with just TC. You have to use some kind of third-party tool or script.

BTW it is not always safe to replace duplicates with hardlinks: if you have hardlinks and you change one copy, all other hard copies will be changed too.
User avatar
DrShark
Power Member
Power Member
Posts: 1872
Joined: 2006-11-03, 22:26 UTC
Location: Kyiv, 68/262
Contact:

Post by *DrShark »

Already has been suggested, but probably won't be added to TC: this feature may be really dangerous if applied to system files.
Suggestion topic, with the link to a findduppe tool that can convert to the links the files listed in the TC panel after search for duplicates.
Example of the button for the Total Commander button bar (requires lst2str).
Notes for the solution with findduppe and lst2str applied to TC filelist:
* it isn't very smart because doesn't give a way to tell to which group of links the unlinked file will be linked (it's possible to have 2 or more groups of hardlinks of same file, with each group made from own parent/inital file);
* it isn't also stable and reliable enough because lst2str doesn't work with too many files selected in the TC panel, especially if they/their paths have long names. In such a case following error appears:

Code: Select all

---------------------------
---== ATTENTION (lst2str) ERROR ==---
---------------------------
Too many files selected (CL limit reached)! Continue? (result would be truncated)
---------------------------
ОК   Отмена   
---------------------------
After or instead of that error, the crash may just happen.
Donate for Ukraine to help stop Russian invasion!
Ukraine's National Bank special bank account:
UA843000010000000047330992708
HAL 9000
Senior Member
Senior Member
Posts: 384
Joined: 2007-09-10, 13:05 UTC

RE:Replace duplicates with hard-links

Post by *HAL 9000 »

Well try
Duplicate File Hard Linker (DFHL)
https://github.com/Hopfengetraenk/DFHL

Just drag 'DFHL.cmd' from
https://github.com/Hopfengetraenk/DFHL/files/2168067/DFHL_2.6.zip
on the TC-buttonbar and set
[face=courier]? %P[/face]
as parameters.

Now this will look for Duplicates and hardlink them to one file.

_______________________________________

To just copy some folder with hardlinks I use
link shell extension
http://schinagl.priv.at/nt/hardlinkshellext/linkshellextension.html#download

When you've installed it -
right click on source folder -> set Source
right click on destination folder -> paste a/smart copy
User avatar
Usher
Power Member
Power Member
Posts: 1675
Joined: 2011-03-11, 10:11 UTC

Re: Replace duplicates with hard-links

Post by *Usher »

HAL 9000 wrote:Duplicate File Hard Linker (DFHL)
https://github.com/Hopfengetraenk/DFHL
I can't see any link to download exe there.
By default this exe won't run under Windows XP. The developers claim that:
DFHL readme.md wrote:The tool runs in Windows NT 4.0 / 2000 / XP and 2003 Server and requires a NTFS file system to run on.
So it should be compiled to run also in older Windows.

In general, I can't recommend this tool. It seems to NOT support Unicode (partially, at least), so you may have problems when dealing with any accented character in path or file name.

Short tests run in directory with 33687 ( >32 K ) files, 93102588422 ( >86 GiB ) bytes. Some files or paths with Polish or Russian names, Windows XP SP3 with Polish language settings. DFHL used for listing files only, with no /l (link) option specified.

1. Run DFHL 2.0 (old version) under Windows XP set to Polish.
- Doesn't display any Polish or Russian character, just ends lines in such places.
- Seems to support only 32 K files, for larger directories ends with a crash.

2. Run DFHL 2.6 (use editbin to change required Windows version)
- Displays any Polish or Russian character as question mark.

So - even if DFHL makes links properly, you will have to check all links with other tools, as the file listing from DFHL won't be helpful.

Some more remarks:
- DFHL may skip file less than 1 KiB, but you can't change that limit.
- You can't select which file will be kept and which one will be replaced with link. I prefer to keep a copy with older timestamp, bot another user may prefer to keep files in a certain "master" directory. DFHL allows only to limit linking to files with the same timestamp, so it's definitely not the option to remove duplicates of downloads.
Andrzej P. Wozniak
Polish subforum moderator
HAL 9000
Senior Member
Senior Member
Posts: 384
Joined: 2007-09-10, 13:05 UTC

Re: Replace duplicates with hard-links

Post by *HAL 9000 »

Wow first of all thanks for replying and for having a deeper look and testing DFHL.
About the history of DFHL.
I need to say that I just ask 'Hans Schmidts' who did some enhancements for the source code. Converted the sources from Visual Studio 6 to Visual Studio 2015...and put it to Github.
Before receiving the sources for v2.5 I did just for fun a reverse engineering project using IDA 7 to decompile Version 2.5 and the sources from v 2.0 to recover restore the names and class structures.
Nice to see what is possible. :P
However when getting the source I discard this project. However by that i got some rough overview about the sources all classes and functions.

Some oddity about this source is that it included a copy of the Windows 'CreateHardLink' API in ('Hardlink.cpp') that can be uses instead of just calling this API. (It is though for old systems like Windows NT whose kernel already had the ability to do hard links but there was no really user API for it)
Usher wrote:
HAL 9000 wrote:Duplicate File Hard Linker (DFHL)
https://github.com/Hopfengetraenk/DFHL
I can't see any link to download exe there.
There is that pink/blue line (language bar) above in the middle it there is a link "3 releases". Just in between with branches and contributor.
There you can Dl the exe.

But yes sorry old problem this Exe won't run on Windows XP and below.
Mainly because of the OperatingSystemVersion that is set by the linker in the PE-Header.

[face=courier]->Optional Header
Magic: 0x010B (HDR32_MAGIC)
MajorLinkerVersion: 0x0E
MinorLinkerVersion: 0x00 -> 14.00
...
MajorOperatingSystemVersion: 0x0006
MinorOperatingSystemVersion: 0x0000 -> 6.00
MajorImageVersion: 0x0000
MinorImageVersion: 0x0000 -> 0.00
MajorSubsystemVersion: 0x0006
MinorSubsystemVersion: 0x0000 -> 6.00
Win32VersionValue: 0x00000000
SizeOfImage: 0x0005F000
SizeOfHeaders: 0x00000400
...
[/face]
I my care for that in further builds. Well as a quick hack open the Exe in a Hexeditor look for 'PE' at the beginning and then watch for two 06 00 00 00 as show below:
[face=courier]
00000060 74 20 62 65 20 72 75 6E 20 69 6E 20 44 4F 53 20 t be run in DOS
00000070 6D 6F 64 65 2E 0D 0D 0A 24 00 00 00 00 00 00 00 mode. $
00000080 50 45 00 00 4C 01 05 00 CC 79 3E 5B 00 00 00 00 PE L Ìy>[
00000090 00 00 00 00 E0 00 02 01 0B 01 0E 00 00 3C 04 00 à <
000000A0 00 90 01 00 00 00 00 00 07 7D 00 00 00 10 00 00 }
000000B0 00 50 04 00 00 00 40 00 00 10 00 00 00 02 00 00 P @
000000C0 06 00 00 00 00 00 00 00 06 00 00 00 00 00 00 00
[/face]
change this two 06 to 04 and save it. Now the Window XP-Loader should at least go on. Well that's what editbin does.
Usher wrote: 1. Run DFHL 2.0 (old version) under Windows XP set to Polish.
- Doesn't display any Polish or Russian character, just ends lines in such places.
- Seems to support only 32 K files, for larger directories ends with a crash.
Okay proper char encoding decoding is always a hidden bug subject as well as proper exception handling. As far as I saw DFHL uses strictly unicode Strings and Api's. However what is probably missing is to also decode strings that are read and encode strings that are written out.


DFHL has some exception handling, but only for C++ exception - but any zero pointer or access violation is not trapped by that and just leads to a crash. I'm onto it adding some of this nasty Ms __try __except blocks at critical points so the program goes on.
... and of course I'll try to fix the bug and fix it.
:arrow: please attach a zip file with some sample files to reproduce & test this bug
... or even better file an issue on the DFHL-Source codepage on GitHub.
Usher wrote: - You can't select which file will be kept and which one will be replaced with link. I prefer to keep a copy with older timestamp, bot another user may prefer to keep files in a certain "master" directory.
Before hardlinking files are checked to be the same so it should not matter that much which of both file is picked. Okay regarding file fragmentation and assuming that old file are less fragmented than new ones it is maybe important which file DFHL picks.
User avatar
Usher
Power Member
Power Member
Posts: 1675
Joined: 2011-03-11, 10:11 UTC

Re: Replace duplicates with hard-links

Post by *Usher »

HAL 9000 wrote:Some oddity about this source is that it included a copy of the Windows 'CreateHardLink' API in ('Hardlink.cpp') that can be uses instead of just calling this API. (It is though for old systems like Windows NT whose kernel already had the ability to do hard links but there was no really user API for it)
Yes, that is clearly stated in the changelog:
DFHL changelog wrote:Changes from Version 1.0 to Version 1.1
* Added Support for Windows NT 4.0, missing Hardlink API was created
HAL 9000 wrote:There is that pink/blue line (language bar) above in the middle it there is a link "3 releases". Just in between with branches and contributor. There you can Dl the exe.
In most cases the link is repeated in readme.md, so I've never learnt what's hidden below "release". My bad.
HAL 9000 wrote:Well that's what editbin does.
Editbin has some more options taken from linker, f.e. it may clear or recalculate checksums when saving changes.
HAL 9000 wrote:As far as I saw DFHL uses strictly unicode Strings and Api's. However what is probably missing is to also decode strings that are read and encode strings that are written out.
It seems to be a limitation of console output. If you want to keep Unicode file and directory names, you should always log them to a text (UTF-16) file. It means that only statistics of "bytes saved" will be sent to stdout - to let user know that the tool is still working.
HAL 9000 wrote:... or even better file an issue on the DFHL-Source codepage on GitHub.
You mean: Sign up and file an issue, right?
HAL 9000 wrote:Okay regarding file fragmentation and assuming that old file are less fragmented than new ones it is maybe important which file DFHL picks.
That's not what I mean. Some installers don't preserve original timestamps, some developers change timestamps with every release even for third party libs they use, no matter whether the files are really changed or recompiled – and you can only guess what's happened, as the file with the newer timestamp may have the same size and older or missing version number.

And a question about other possible options - How to deal with NTFS compressed or sparse files?
Andrzej P. Wozniak
Polish subforum moderator
Post Reply