Page 1 of 1

Use bigger buffer/chunks when comparing files in "Synchronize directories"

Posted: 2019-06-26, 15:22 UTC
by kapela86
When you use "Synchronize directories" and have checked "by content" I noticed that files are read really slow (if you compare it to for example generating md5 of source files and verifying it with destination files). After I did some testing I saw that files are read in 32KB chunks, these chunks are compared and if they are different then files are marked as different. And it's great because it speeds up comparison in many situations. But it slows it down when you compare files on the same physical media (HDD/CD/DVD/BR). My proposition is to use bigger chunks in this situation, either automatically detecting that source and destination are on same physical media, or adding a checkbox to UI, or ini setting.

Re: Use bigger buffer/chunks when comparing files in "Synchronize directories"

Posted: 2019-06-27, 09:34 UTC
by ghisler(Author)
When I implemented this, I tried with various block sizes from 32k to 1MB. Interestingly, the 32k method was the fastest, probably due to Windows read cache.

Re: Use bigger buffer/chunks when comparing files in "Synchronize directories"

Posted: 2019-06-28, 05:29 UTC
by MVV
I agree that there should be an option, because reading files by small chunks from HDDs should be slower than reading them by large chunks.

Re: Use bigger buffer/chunks when comparing files in "Synchronize directories"

Posted: 2019-06-28, 10:38 UTC
by kapela86
ghisler(Author) wrote: 2019-06-27, 09:34 UTC When I implemented this, I tried with various block sizes from 32k to 1MB. Interestingly, the 32k method was the fastest, probably due to Windows read cache.
Well, I don't know how you got that results during your testing. How long ago did you do it? Let me say just this. If you are reading only one file then reading it in chunks larger that 32KB probably won't have any difference on modern HDD, for example here's ATTO benchmark on my Seagate Barracuda Pro 10TB: https://i.imgur.com/mDJLHsR.png
But if you are reading two different files then actuator has to go from one place to another. And that time it takes to move actuator directly corresponds to slower reading (same principle applies to fragmented files and why defragmentation is needed). And if you are reading file in larger chunks then actuator doesn't have to move that much form one place to another. If you need to create some test case, I suggest an extreme example: record some large files on DVD and test synchronization on them. You will clearly see that there is a difference with larger chunk size.

Re: Use bigger buffer/chunks when comparing files in "Synchronize directories"

Posted: 2019-06-28, 21:09 UTC
by Hacker
I remember exactly that when Christian implemented a bigger buffer it was me who was testing some burned CD's or DVD's by comparing them with the original data on the HDD and the comparison was about 10 times slower than with the 32 KB buffers. Upon reporting this Christian was very surprised but after reverting the change the read speeds went back to normal.
I wanted to check the beta forum to see if I could still find the thread and refresh my memory but the betaboard is offline now. I did not find any mention in the history.txt, either.
So I guess we could make the value configurable perhaps?

Roman

Re: Use bigger buffer/chunks when comparing files in "Synchronize directories"

Posted: 2019-06-28, 22:19 UTC
by Usher
Hacker wrote: 2019-06-28, 21:09 UTC So I guess we could make the value configurable perhaps?
Sure, but… I suspect that the best value for HDD may be somehow related to the size of HDD cache, so there should be separate settings for every drive.

Re: Use bigger buffer/chunks when comparing files in "Synchronize directories"

Posted: 2019-06-29, 12:22 UTC
by kapela86
Hacker wrote: 2019-06-28, 21:09 UTC it was me who was testing some burned CD's or DVD's by comparing them with the original data on the HDD and the comparison was about 10 times slower than with the 32 KB buffers
Interesting, I can't think of why this would happen and my curiosity "sparked".
Hacker wrote: 2019-06-28, 21:09 UTC So I guess we could make the value configurable perhaps?
I think reading in bigger chunks is only needed if you read files from same physical location. Still, I would love to help test this, for now you could just implement it as user configurable value in beta version and let me know so I could test it in different situations.