Page 1 of 1

[WCX] Increase speed of enum items from internal archive cache

Posted: 2019-11-08, 10:00 UTC
by remittor
To reproduce the problem, just take the ZIP archive (zip64), which contains a huge number of files (for example, 542 000 files and 61 000 directories).

Navigation into archives is very slow via TotalCmd 9.22a!!!

In the original 7zip utility (7zFM.exe), there is no such problem at all. Navigation is carried out without any delay.

The supposed reasons for the slow work:
1) Using a linear list (rather a tree list);
2) Using character-by-character string comparisons (rather a hash comparison).

Re: [WCX] Increase speed of enum items from internal archive cache

Posted: 2019-11-12, 10:14 UTC
by remittor
ReDisign internal archive cache

Code: Select all

#pragma pack(push, 1)

struct TArcFileInfo {
  int Flags;
  unsigned int PackSize;
  unsigned int PackSizeHigh;
  unsigned int UnpSize;
  unsigned int UnpSizeHigh;
  int HostOS;
  int FileCRC;
  int FileTime;
  int UnpVer;
  int Method;
  int FileAttr;
  char * CmtBuf;
  int CmtBufSize;
  int CmtSize;
  int CmtState;
};

const int AdditionalDataSize = 64;

struct TArcElemInfo {
  UINT64   PackSize;           /* Packed file size */
  UINT64   UnpSize;            /* Unpacked file size */
  int      FileAttr;           
  FILETIME CreationTime;       /* birthtime, Win32 format, GMT+0 */
  FILETIME LastWriteTime;      /* mtime, Win32 format, GMT+0 */
  FILETIME LastAccessTime;     /* atime, Win32 format, GMT+0 */
  BYTE     AdditionalData[AdditionalDataSize];
};

struct TArcCacheRec {
  UINT32   RecSize;        /* = sizeof(TArcCacheRec) + FullNameLen * 2 */
  BYTE     Flags;
  TArcCacheRec * Link;     /* addr of parent/child record (NULL for empty dir and normal file) */
  union {
    TArcFileInfo  FileInfo;  /* for compatibility */
    TArcElemInfo  ElemInfo;
  };
  UINT16   NameOffset;     /* offset to last name into FullName */
  UINT32   NameHash;       /* hash for last name from FullName field  */
  UINT16   FullNameLen;    /* length of FullName field (number of character without zero-termination) */
  WCHAR    FullName[1];    /* full file name with zero-termination */
};

#pragma pack(pop)

/* TArcCacheRec.Flags */
const BYTE  FLAG_FILE_INFO  = 0x01;    /* using old file info field, see TArcFileInfo */
const BYTE  FLAG_DOUBLE_DOT = 0x02;    /* for parent link */
const BYTE  FLAG_DIRECTORY  = 0x04;    /* for child link */
const BYTE  FLAG_VIRTUAL    = 0x08;    /* for elements, which TotalCmd self created */
Visual explanation: Image: https://i.imgur.com/GtlEMsv.jpg

Code: Select all

/* get next record */
TArcCacheRec * get_next_record(TArcCacheRec * current)
{
  if (current->RecSize == 0)
    return NULL;

  TArcCacheRec * rec = (TArcCacheRec *)((size_t)current + sizeof(TArcCacheRec) + current->RecSize);
  return (rec->RecSize) ? rec : NULL;
}

/* recursive free all memory blocks */
void clear_cache(TArcCacheRec * root)
{
  if (root) {
    TArcCacheRec * rec = root;
    while (rec = get_next_record(rec)) {
      if (rec->Flags & FLAG_DIRECTORY) {
        clear_cache(rec->Link);
      }
    }
    free(root);
  }
}

Re: [WCX] Increase speed of enum items from internal archive cache

Posted: 2019-11-12, 11:19 UTC
by remittor
If needed support renaming files:

Code: Select all

struct TArcCacheRec {
  BYTE     Flags;
  TArcCacheRec * Link;     /* addr of parent/child record (NULL for empty dir and normal file) */
  union {
    TArcFileInfo  FileInfo;  /* for compatibility */
    TArcElemInfo  ElemInfo;
  };
  UINT16   PathLen;
  WCHAR  * Path;           /* path to elem with backslash on end (same addr for current memory block) */
  UINT32   NameHash;       /* hash for Name field  */
  UINT16   NameLen;        /* length of Name field (number of character without zero-termination) */
  WCHAR    Name[256];      /* elem name with zero-termination */
};

const BYTE  FLAG_EOL        = 0x80;    /* EOL */

Code: Select all

/* get next record */
TArcCacheRec * get_next_record(TArcCacheRec * current)
{
  if (current->Flags & FLAG_EOL)
    return NULL;
  if (current[1].Flags & FLAG_EOL)
    return NULL;
  return &current[1];
}

Code: Select all

/* recursive free all memory blocks */
void clear_cache(TArcCacheRec * root)
{
  if (root) {
    TArcCacheRec * rec = root;
    while (rec = get_next_record(rec)) {
      if (rec->Flags & FLAG_DIRECTORY) {
        clear_cache(rec->Link);
      }
    }
    if (root->Path) {
      free(root->Path);
    }  
    free(root);
  }
}

Re: [WCX] Increase speed of enum items from internal archive cache

Posted: 2019-11-12, 18:22 UTC
by milo1012
Well, support++ for speeding up the virtual path navigation.
You don't need 100k+ files in an archive, even with 20k files you already have a palpable delay as soon as you navigate further, i.e. change into a sub-dir of that virtual file tree.

This is no new problem, but it gets quite annoying nowadays, especially since TC x64 isn't much faster than 32-bit TC in this matter.
My impression is, that TC uses a quite inefficient data structure to store the file/path entries for the virtual tree (and/or maybe it's the sorting algorithm behind it).

Re: [WCX] Increase speed of enum items from internal archive cache

Posted: 2019-11-13, 06:35 UTC
by MVV
As I know, TC reads archive contents once on first entering and caches, so slow navigation within archive is a result of inefficient data structure, as milo1012 said, not an API fault. Perhaps TC really keeps archive file records in a list instead of a tree. It seems that for compatibility with linear structure a compact tree may be used that only keeps names and file record indices in a linear array.

Re: [WCX] Increase speed of enum items from internal archive cache

Posted: 2019-11-13, 07:44 UTC
by remittor
MVV wrote:
2019-11-13, 06:35 UTC
... not an API fault.
Nobody wrote about the error in the API.
Error in choosing the algorithm for working with data.
MVV wrote:
2019-11-13, 06:35 UTC
It seems that for compatibility with linear structure a compact tree may be used that only keeps names and file record indices in a linear array.
This forum has user requests to add the ability to rename files in archives (internal ZIP support this).
Therefore, it is worth radically revising the algorithm for storing data from archives.

Re: [WCX] Increase speed of enum items from internal archive cache

Posted: 2020-01-15, 07:42 UTC
by remittor
The above-described file tree structure incorrect!!!

Here you can look at the correct implementation of the file tree structure: https://github.com/remittor/wcpatcher/blob/dev/src/wcp_filetree.h

A visual diagram that describes the construction of a tree: Image: https://i.imgur.com/WGKc4w9_d.jpg?maxwidth=123456789&shape=thumb&fidelity=high

Re: [WCX] Increase speed of enum items from internal archive cache

Posted: 2020-01-15, 09:01 UTC
by MVV
You can always keep most up to date info in first post.