[WCX] ZPAQ

Discuss and announce Total Commander plugins, addons and other useful tools here, both their usage and their development.

Moderators: sheep, Hacker, Stefan2, white

Post Reply
User avatar
milo1012
Power Member
Power Member
Posts: 1109
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 » 2016-04-12, 17:32 UTC

New Version 1.4a!
  • fixed: the state of the 'Warn about memory requirements' option wasn't saved to config/ini file
    (option was reset to old state after a TC restart)
  • fixed a possible memory leak due to missed progress dialog updates (was probably never triggered)
  • the undocumented block size option will now correctly load and save values from/to ini file
  • Russian translation update (by 'Skif_off')
  • Danish translation update (by 'petermad')
  • Chinese Simplified translation update (by 'wwj402')
Check the first post for the new file.


Just a small update.
Like I said, if there are no critical bugs in this version, this will be the last major release for a longer period of time.
TC plugins: PCREsearch and RegXtract

reg2s
Junior Member
Junior Member
Posts: 9
Joined: 2016-05-23, 19:26 UTC

Post by *reg2s » 2016-05-23, 19:42 UTC

Thanks for the plugin. Zpaq is very usefull for keeping small updates in directory tree. Great for work with source codes.

Is it possible to implement in-archive view, as produced with "-until N" option of zpaq utility? Or how to unpack whole files with not latest, but some intermidiate version?

User avatar
milo1012
Power Member
Power Member
Posts: 1109
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 » 2016-05-23, 20:48 UTC

reg2s wrote:Is it possible to implement in-archive view, as produced with "-until N" option of zpaq utility?
In theory yes, but how would you configure such option? Through the options dialog?
The only approach I could think of would be a user configurable list of mappings of archives (including full path) to a specific version number, which I would save to a (standalone) config file.
Now every time I open any zpaq archive I would check such list/map for an entry and - if found - apply that version truncation before listing the archive for TC.
This would be feasible, but would take some time to implement and has practical problems, like losing the mapping if you move or rename the archive.
Anyway, would this suffice for your needs, or do you have a different/better idea for how to do that?

reg2s wrote:Or how to unpack whole files with not latest, but some intermidiate version?
Well, you can show all archive content through the "Show all archive versions" option, and then use TC's branch view (Ctrl+B).
In there you can easily select all files of the same name. You are even able to simply deselect versions similar to the -until option by deselecting files from the end of that group of files you just selected, as the default sort order in branch view will show the latest version at the very end, due to the archive version names being named in ascending order (or just look at the relative file path, which will be shown instead of the pure file name in branch view, below the file list).
Now simply unpack the selected files (just use F5) from the archive.
TC plugins: PCREsearch and RegXtract

User avatar
Hacker
Moderator
Moderator
Posts: 11395
Joined: 2003-02-06, 14:56 UTC
Location: Bratislava, Slovakia

Post by *Hacker » 2016-05-23, 21:33 UTC

milo1012,
Anyway, would this suffice for your needs, or do you have a different/better idea for how to do that?
Not that I'd be using it, but perhaps the config could be stored / edited using the descript.ion of the archive?

Roman
Mal angenommen, du drückst Strg+F, wählst die FTP-Verbindung (mit gespeichertem Passwort), klickst aber nicht auf Verbinden, sondern fällst tot um.

reg2s
Junior Member
Junior Member
Posts: 9
Joined: 2016-05-23, 19:26 UTC

Post by *reg2s » 2016-05-23, 22:07 UTC

If acrhive contains hundreds of files it is impractical to select them one by one in branch view. Also if directory tree was packed, not individual files, it is very hard to reconstruct it, as during unpacking files are put in appropriate "archive version names" subdirectories.

Yes, I thought of dialog option. When archive is opened, list of all revisions should be built. For example, "zpaq l foo.zpaq -all" will give maximum number of updates - N. Then directory tree of in-acrhive view can be built as follows:

dir1: zpaq l foo.zpaq -until 1
dir2: zpaq l foo.zpaq -until 2
...
dirN: zpaq l foo.zpaq -until N

Obviously, that unpacking should be done with "zpaq x foo.zpaq -until <selection>".

I've tried to implement this with MultiArc addon, but it has too limited functionality for this.

User avatar
milo1012
Power Member
Power Member
Posts: 1109
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 » 2016-05-24, 01:59 UTC

Hacker wrote:but perhaps the config could be stored / edited using the descript.ion of the archive?
Sure, using individual external files might be a solution, though I never was a friend of that concept.
I'm not exactly sure what I would supposed to do with existing descript.ion file entries - append my data somehow? It could break the format.


reg2s wrote:dir1: zpaq l foo.zpaq -until 1
dir2: zpaq l foo.zpaq -until 2
...
dirN: zpaq l foo.zpaq -until N
Unfortunately this is exactly what I can't do in the plug-in, as I can't have multiple zpaq "sessions" under the same archive name/listing.
It was already hard to link all temporary data needed for the wcx interface into one session, but this is just not possible with my current plug-in mechanism, as it would brake most things I've already implemented to filter files, dirs and ADS, plus it would use an extreme amount of memory for large archives and make the internal sorting mechanisms quite slow.
reg2s wrote:If acrhive contains hundreds of files it is impractical to select them one by one in branch view. Also if directory tree was packed, not individual files, it is very hard to reconstruct it, as during unpacking files are put in appropriate "archive version names" subdirectories.
I understand what you mean, but that's why I thought of a mapping of some form. Showing a combination of all possible archive "states" is just not practical for large archives IMO, but somehow selecting a specific version to which you want to list might be possible.

I can think of two solutions for now to apply the "-until" option:
1st
I could check whether you try to unpack a complete "archive version" name dir, i.e. you want to copy dir "00000008" or similar out of the archive.
When this is the case, I would internally apply the "-until" option.
But this is flawed due to wcx restrictions, as TC/wcx needs a valid file listing prior to extraction, and so I can't just extract paths/files that TC didn't request in the first place, as TC e.g. creates all dirs by itself prior to extracting and so you'd possibly end up with empty dirs in the target, not actually belonging there.
I need to test whether this is practically possible without breaking too much.

2nd
Just creating "dummy" entries in the archive listing and let the user request a specific listing, like

Code: Select all

00000001
  \ files
00000002
  \ files
 ...
0000000N
  \ files
untilN
  \00000001
  \00000001\extractme
  \00000002
  \00000002\extractme
  ...
  \0000000N
  \0000000N\extractme
(of course you would still be able to swap the "0000000N" dirs with the "Show archive version names as detailed timestamp" names)

Now every time you try to extract the "extractme" file, I can make checks for which version you extracted the dummy file and internally apply the "-until" option.
I wouldn't extract the file for real of course, as it would be empty anyway.
Now the trick is that TC re-reads the archive after extraction, so you would immediately have a new archive listing after that, with the "-until" option applied.
The only difficulty is to keep the until number mapped to the archive path in the plug-ins memory pool, in case you close and reopen the archive some time later.
Of course I will have to test this approach for additional practical problems.

What do you think?
TC plugins: PCREsearch and RegXtract

reg2s
Junior Member
Junior Member
Posts: 9
Joined: 2016-05-23, 19:26 UTC

Post by *reg2s » 2016-05-24, 06:16 UTC

Ok, things are more complicated than I thought. It is very sad that specified listing can't be built. But I understand tha this is TC's limitation.

It would be more intuitive just select "00000008" dir and, if extraction was requested, re-read archive with "-until" applied and extract files. Solution with "extractme" looks more like hack.

As for me, it is better not to use any memory cache and just re-read archive every time it is opened. Slow, but reliable approach.

User avatar
milo1012
Power Member
Power Member
Posts: 1109
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 » 2016-05-24, 16:09 UTC

reg2s wrote:But I understand tha this is TC's limitation.
As I said: not only TC/wcx limitation, but also the limitation of using the plug-in without rewriting a major part of it.
In short: I want and need to know the amount of files inside an archive before starting to report it to TC, I can't just list all version combinations for TC "on-the-fly",
because I keep iterators to data blocks in memory when TC requests file entries. You're free to take a look at the source and see for yourself how I implemented it.
And even if I would take one zpaq instance in memory for every archive state listing: Zpaq will create maps to the archive's blocks, fragments and file entries. This will already take a considerable amount of memory, compared to different/older archivers, and would be worse when doing it multiple times, possibly running out of memory for > 1000 updates.

But no matter what I do for the plug-in, even TC needs to hold all files entries in memory to make the (virtual) navigation in the archive possible.
If I would create a combination of all possible archive states, I might end up with:
N*X file entries needing to report to TC, where N is the number of updates, X is the number of files, and this doesn't even take added files into account, but assuming a fixed number of files only being modified for each update.
This might be OK for a < 100 archive updates, but for an archive with > 10000 files and/or > 1000 updates this will be too much, or in other words: making archive listing unnecessarily complex. TC already has palpable slowdowns when listing an archive with > 50000 files inside.

reg2s wrote:It would be more intuitive just select "00000008" dir and, if extraction was requested, re-read archive with "-until" applied and extract files.
I will see if I can implement this for one of the next versions (being optional of course).
But like I said before: I won't have time to work on the plug-in until in a few months or so, as this is quite time consuming.
reg2s wrote:Solution with "extractme" looks more like hack.
I'd say this is in the eye of the beholder. You'd want to request a specific listing and so you need to tell it somehow.
reg2s wrote:As for me, it is better not to use any memory cache and just re-read archive every time it is opened. Slow, but reliable approach.
I don't see a problem with this, as I would basically only hold the last listed archive path (incl. file attributes) in memory. The wcx interface will basically not allow a wild mixture of open/close operations.
So if you open a different archive in the meantime, you need to re-request the listing for the first archive.
TC plugins: PCREsearch and RegXtract

User avatar
Hacker
Moderator
Moderator
Posts: 11395
Joined: 2003-02-06, 14:56 UTC
Location: Bratislava, Slovakia

Post by *Hacker » 2016-05-24, 18:41 UTC

milo1012,
I'm not exactly sure what I would supposed to do with existing descript.ion file entries - append my data somehow? It could break the format.
No, your plugin would just read the descript.ion files, the user would be responsible for editing them.

Roman
Mal angenommen, du drückst Strg+F, wählst die FTP-Verbindung (mit gespeichertem Passwort), klickst aber nicht auf Verbinden, sondern fällst tot um.

reg2s
Junior Member
Junior Member
Posts: 9
Joined: 2016-05-23, 19:26 UTC

Post by *reg2s » 2016-05-25, 05:07 UTC

milo1012 wrote: But like I said before: I won't have time to work on the plug-in until in a few months or so, as this is quite time consuming.
I just wanted to know if the problem could be solved in some way. Thank you for your work and explanations.

User avatar
Hacker
Moderator
Moderator
Posts: 11395
Joined: 2003-02-06, 14:56 UTC
Location: Bratislava, Slovakia

Post by *Hacker » 2016-08-20, 13:39 UTC

Hi milo1012,
Let's say I have these files:
Backup_00001.zpaq
...
Backup_00055.zpaq

A file called C:\ImportantDir\ImportantDocument.docx was added to backup #20 and an updated version to #35.

Now, to see various versions of a file I currently have to know in which versions they have been added to browse the appropriate version dirs (00020 and 00035) or I have to use the branch view but in that case I also see ImportantDocument.docx in C:\OtherDir1\, C:\OtherDir2\ and C:\OtherDir3\ and I have to check which file is from which path to choose the right one.

Would it be possible to browse an archive and see all versions of all files at the same time, like this?

c:\Backup.zpaq\ImportantDir\ImportantDocument <2016-08-10 19:33:02>.docx
c:\Backup.zpaq\ImportantDir\ImportantDocument <2016-07-16 15:21:31>.docx

:?:


Second, unrelated question:
I have again these files:
Backup_00001.zpaq
...
Backup_00055.zpaq

Now say these backup files started taking up too much place and I would like to do one of these two things:
- keep all files, but for each file that has multiple versions, only keep the last 5 versions
- keep all files, but for each file that has multiple versions, only keep versions from the last 6 months

Is this possible either using the plugin or zpaq command line?

Thank you
Roman
Mal angenommen, du drückst Strg+F, wählst die FTP-Verbindung (mit gespeichertem Passwort), klickst aber nicht auf Verbinden, sondern fällst tot um.

User avatar
milo1012
Power Member
Power Member
Posts: 1109
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 » 2016-08-20, 17:42 UTC

2Hacker
First question:
Theoretically yes.
I need to give this some thought about how I'd need to alter the internal maps for this and how I could resolve possible name collisions and how to treat deleted and added directories within such approach. So I might implement such (optionally) view in one of the next versions, provided that there are no code obstacles for it. But personally I prefer not to rename files, as I wouldn't be able to compare external dirs with the archive content in a decent way, needing to manually exclude such renamed files each time.

Second question:
Not really, or at least only with a lot of limitations.
The archive format allows a rollback, but not truncating "front" data, as the deduplication feature might rely on such old data blocks.
In fact, Matt answered a similar question recently:
http://encode.ru/threads/456-zpaq-updates?p=47910&viewfull=1#post47910
> Is there a way to remove old versions of files inside a multipart (or not) ZPAQ archive ?

Yes, but the result is a single-part archive.

zpaq extract backup???.zpaq -repack new.zpaq

This works like the purge command. It deletes blocks if none of the fragments in them are referenced by the current version of any files. It is faster than extracting and creating a new archive, but does not compress as well because it keeps the whole block even if only one fragment is needed.

Unlike the old purge command, you can also use filters (-not, -only, -until) to select files that will be copied, or -noattributes to delete attributes.
I haven't tested this myself, but you might able to use it along with the -until option, but as I see it this still doesn't work on a "per file" basis, but only on a "per update" basis, i.e. you can only select a number of archive updates to remove. So if your individual files don't have a new version in each archive update, you need to figure out yourself where to truncate and cannot just keep e.g. the latest five archive updates and expecting that this would also remove all but the last five file versions of your choice.
TC plugins: PCREsearch and RegXtract

User avatar
Hacker
Moderator
Moderator
Posts: 11395
Joined: 2003-02-06, 14:56 UTC
Location: Bratislava, Slovakia

Post by *Hacker » 2016-08-20, 21:03 UTC

milo1012,
Thank you very much for considering, it would be a big help.
possible name collisions
Well, there won't be any names containing "<" nor ">" so if you can use these within the TC plugin interface there should be no collisions.
how to treat deleted and added directories
As there is no sense in having directory versions (for what? different versions of their timestamps?) I don't think directories need any versions. Also, I'd assume restoring Backup_00002.zpaq\Directory\FileWithManyVersions.docx would also restore the Directory with its attributes as stored in Backup_00002.zpaq anyways.
Only thing that might be worth considering is what if there is DIRectory in one backup version and dirECTory in another backup version. Two solutions I guess - only showing say the latest name version, or show all versions with <timestamp> and including all files in both directories (so the directories would be contain identical files, only the directory names (and attributes) themselves would be different).

Thank you very much also for the answer to my second question.

Thanks
Roman
Mal angenommen, du drückst Strg+F, wählst die FTP-Verbindung (mit gespeichertem Passwort), klickst aber nicht auf Verbinden, sondern fällst tot um.

User avatar
milo1012
Power Member
Power Member
Posts: 1109
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 » 2016-08-21, 12:01 UTC

Hacker wrote:Well, there won't be any names containing "<" nor ">" so if you can use these within the TC plugin interface there should be no collisions.
I actually intended to use the same custom date string as for "Show archive version names as detailed timestamp", where I filter any Windows forbidden characters in order to be able extract the files w/o doing another file name cleanup (WYSIWSYG).
The more important question would be: how to treat files that are in the 240-255 characters name length range, where the timestamp can't be appended to the name w/o truncating. I would need to truncate the filename itself, but this would mess up the sort order when viewing a dir...
Anyway, by name collisions I meant collisions in my data structures (in a map) - but this can be solved by using a different structure.
Hacker wrote:As there is no sense in having directory versions (for what? different versions of their timestamps?) I don't think directories need any versions. Also, I'd assume restoring Backup_00002.zpaq\Directory\FileWithManyVersions.docx would also restore the Directory with its attributes as stored in Backup_00002.zpaq anyways.
Sure, but I need to make some decision. Using the latest dir timestamp or the first one?
Hacker wrote:Only thing that might be worth considering is what if there is DIRectory in one backup version and dirECTory in another backup version. Two solutions I guess - only showing say the latest name version, or show all versions with <timestamp> and including all files in both directories (so the directories would be contain identical files, only the directory names (and attributes) themselves would be different).
I think TC would see it as a single name (though I have to test this), but since we're in the Windows world - where things are still case in-sensitive (especially when you would extract both dirs to a single location) - I need to treat such names as identical. And I think I'd take the very name from the version I would use the timestamp from (see above).
TC plugins: PCREsearch and RegXtract

User avatar
Hacker
Moderator
Moderator
Posts: 11395
Joined: 2003-02-06, 14:56 UTC
Location: Bratislava, Slovakia

Post by *Hacker » 2016-08-21, 12:17 UTC

Hi milo1012,
I actually intended to use the same custom date string as for "Show archive version names as detailed timestamp", where I filter any Windows forbidden characters in order to be able extract the files w/o doing another file name cleanup (WYSIWSYG).
Well, my suggestion would be to use Filename <timestamp>.docx for display (of course, for the <timestamp> part the existing custom date string can be used, just surround it by < > so that there are no possible collisions) and give the user an option (in the options dialog) to extract to:
a) Filename [timestamp].docx
b) Filename.docx

This would include filename conversions, yes, but not really difficult ones, I assume.
Using the latest dir timestamp or the first one?
I think latest one is more interesting / useful.
I think TC would see it as a single name (though I have to test this), but since we're in the Windows world - where things are still case in-sensitive (especially when you would extract both dirs to a single location) - I need to treat such names as identical.
Well, as I wrote, two options:
a) latest name ("dirECTory")
b) name with <timestamp>

Personally I'd say a) is the more sane approach.
I think I'd take the very name from the version I would use the timestamp from (see above).
I agree, I would just like to add, if I extract Filename.docx from version 52, please also use dirname from version 52 and dir timestamp from version 52.

Thank you
Roman
Mal angenommen, du drückst Strg+F, wählst die FTP-Verbindung (mit gespeichertem Passwort), klickst aber nicht auf Verbinden, sondern fällst tot um.

Post Reply