Page 1 of 1

Find duplicate plugin fields: slow and no status

Posted: 2013-11-11, 20:43 UTC
by white
Tested TC 8.50b8 32bit.

Function Search/Advanced/Find duplicate files

Search large number of files.

When searching for same name:
* TC seems to scan all folders
* TC shows "Comparison: 99" in the status bar
* Quickly after, the results are displayed

When searching for plugin field [=tc.fullname]:
* TC seems to scan all folders
* TC shows last name found when scanning all folders, in the status bar
* TC does not respond
* After long time results are displayed

It seems like searching for plugin fields is implemented a lot less efficient than searching for same name (or same size). Can it be improved?

I also suggest to display "Comparing" in the status bar before TC becomes not responding.

Can TC be made to respond when comparing, and show progress?

Posted: 2013-11-13, 16:33 UTC
by ghisler(Author)
What plugin fields di you try? If you only search for plugin fields, TC has to get the plugin data value for all files first, and then sort by this value. This can take a long while.

Posted: 2013-11-13, 20:12 UTC
by white
ghisler(Author) wrote:What plugin fields did you try? If you only search for plugin fields, TC has to get the plugin data value for all files first, and then sort by this value. This can take a long while.
I only searched for plugin field "tc.fullname".

Does it take so much longer to get tc.fullname then to get file name directly?

Posted: 2013-11-14, 15:43 UTC
by ghisler(Author)
There is no need to access the harddisk, but there is still a lot of overhead for allocating the memory blocks to store the extra fields. It's also slower to compare two fields of any type with each other than comparing two hardcoded fields of known type. Therefore I don't think that I can improve much in this situation, but I will of course still have a look at it.

Posted: 2013-11-21, 14:57 UTC
by white
HISTORY.TXT wrote:17.11.13 Fixed: Search for duplicate files via plugin fields: Much faster by using quick sort to sort by plugin fields (32/64)
Tested OK using TC 8.50b10 32bit.

Status bar now shows "Comparison:" and it's much much faster.

Posted: 2013-11-21, 15:26 UTC
by ghisler(Author)
Thanks for trying it! I was using the simple bubble sort algorithm because I expected that plugin fields would be used only in combination with other options, so only a few files per group would need to be sorted. But bubble sort becomes quickly very slow when there are a lot of files to compare. I'm therefore now using the much faster quick sort algorithm.

Posted: 2013-12-19, 18:42 UTC
by meisl
Bubble Sort :shock:

Posted: 2013-12-20, 17:21 UTC
by ghisler(Author)
Bubble sort is OK for small number of items, just not for thousands...

Posted: 2013-12-20, 19:37 UTC
by meisl
Sure, even NP-hard problems are all manageable - with a small enough input size...

But no offense, really. It's just that I can't think of any reason why one would use Bubble Sort, except maybe ease of implementation. But then again, wouldn't expect that you implemented sorting "by hand" in TC, rather than use a library...?