Page 1 of 1

[Bug] Incorrect comparison of two unicode strings

Posted: 2019-12-13, 12:24 UTC
by remittor
TotalCmd 9.22a uses this function absolutely everywhere:

Code: Select all

int __usercall wcsicmp@<eax>(WCHAR * wstr1@<eax>, WCHAR * wstr2@<edx>)
{
  int result;
  CHAR tmp2[260];
  CHAR tmp1[260];
  WCHAR wtmp2[1024];
  WCHAR wtmp1[1024];

  if ( win_dwPlatformId == 2 )         // Windows NT
  {
    wcsncpyEx(wtmp1, 1023, wstr1);
    wcsncpyEx(wtmp2, 1023, wstr2);
    wstr_lower(wtmp1);
    wstr_lower(wtmp2);
    result = wcscmp(wtmp1, wtmp2);
  }
  else
  {
    WideCharToMultiByte(0, 0, wstr1, -1, tmp1, 259, 0, 0);
    WideCharToMultiByte(0, 0, wstr2, -1, tmp2, 259, 0, 0);
    str_lower(tmp1);
    str_lower(tmp2);
    result = strcmp(tmp1, tmp2);
  }
  return result;
}
Critical Bug:
If the length of wstr1 or wstr2 is will be more than 259, then WideCharToMultiByte not insert null terminator!
After function strcmp will compare trash!

Recomendation: All this code needs to be replaced with a call shlwapi.dll@StrCmpIW

Re: [Bug] Incorrect comparison of two unicode strings

Posted: 2019-12-16, 11:30 UTC
by ghisler(Author)
This shouldn't be a problem because the part with WideCharToMultiByte is only used on Windows 9x/ME, which doesn't support such long paths. However, I can make tmp1 and tmp2 larger an add a terminating 0 myself just in case.

Re: [Bug] Incorrect comparison of two unicode strings

Posted: 2019-12-17, 12:49 UTC
by remittor
ghisler(Author) wrote: 2019-12-16, 11:30 UTC This shouldn't be a problem because the part with WideCharToMultiByte is only used on Windows 9x/ME
It seems so to you. But in fact, long strings come into function wcsicmp.
For example, the function wcsicmp is called where the display of the internal directory of the archive file is formed. And this function is called in many places.

Code: Select all

wcsicmp(&elem[40], L"plugininst.inf")   /* elem[40] = headerData.FileName */
wcsicmp(locFileName, ArcCurDir)           /* WCHAR locFileName[1024] , ArcCurDir = current archive directory */
wcsicmp(ArcCurDir, filename)              /* filename = locFileName */
ghisler(Author) wrote: 2019-12-16, 11:30 UTC However, I can make tmp1 and tmp2 larger an add a terminating 0 myself just in case.
This is terrible! Do not do that! Just call the function shlwapi.dll@StrCmpIW (since Win98 with IE 4.0).

The function wcsicmp you implemented is slow. This function also introduces its delay when fetching files to display the insides of archives.

Re: [Bug] Incorrect comparison of two unicode strings

Posted: 2019-12-18, 10:27 UTC
by ghisler(Author)
Sorry, I will not change to a so far unknown function just for making TC on legacy Windows 9x/ME a little bit faster. I will add the terminating zeroes myself, although MSDN describes that they are actually added:
https://docs.microsoft.com/en-us/windows/win32/api/stringapiset/nf-stringapiset-widechartomultibyte
cchWideChar
...
If this parameter is -1, the function processes the entire input string, including the terminating null character. Therefore, the resulting character string has a terminating null character, and the length returned by the function includes this character.
My function does pass -1 as cchWideChar.

Re: [Bug] Incorrect comparison of two unicode strings

Posted: 2019-12-18, 13:48 UTC
by remittor
ghisler(Author) wrote: 2019-12-18, 10:27 UTC... although MSDN describes that they are actually added...
This is true only when output string is less than 259 characters long.
Personally test and understand what it is about.
ghisler(Author) wrote: 2019-12-18, 10:27 UTCSorry, I will not change to a so far unknown function
You’ve said the same thing about these functions: StrCmpLogicalW, SHStrDupW. But now use them too.

Re: [Bug] Incorrect comparison of two unicode strings

Posted: 2020-01-24, 09:39 UTC
by remittor
Regarding the use of function StrCmpIW, I already wrote here: https://www.ghisler.ch/board/viewtopic.php?p=367822#p367822
ghisler(Author) wrote: 2020-01-13, 10:44 UTC I'm reluctant to use StrCmpIW because I don't know what it does internally.
I created a special test program that compares the function StrCmpIW and CharLowerW.
test program sources

Code: Select all

#define _CRT_SECURE_NO_WARNINGS

#include <windows.h>
#include <stdlib.h>
#include <stdio.h>
#include <shlwapi.h>

const int str_len = 2;
const WCHAR filename[] = L"test_result.txt";
const WCHAR bom = L'\xFEFF';

int main(int argc, WCHAR* argv[])
{
  DWORD dw;
  WCHAR str[str_len + 16];
  WCHAR lwr[str_len + 16];

  DeleteFileW(filename);
  DWORD dwShareMode = FILE_SHARE_READ | FILE_SHARE_WRITE;
  HANDLE hFile = CreateFileW(filename, GENERIC_WRITE, dwShareMode, NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
  for (size_t i = 0; i < 8; i++) {
    str[i] = 0x20;
  }
  str[0] = bom;
  str[7] = L'\n';
  str[8] = 0;
  WriteFile(hFile, str, (DWORD)wcslen(str) * sizeof(WCHAR), &dw, NULL);

  memset(str, 0, sizeof(str));
  str[0] = 0x20;
  str[1] = 0x20;
  printf("start \n");

  for (size_t t = 0xD800 - 1; t <= 0xDFFF; t++) {
    str[0] = (WCHAR)t;
    if (t == 0xD800 - 1) {
      str[0] = 0x20;
    }
    for (size_t x = 0; x <= 0xFFFF; x++) {
      if (str[0] == 0x20) {
        if (x == 0) continue;  /* ignore null */
      }
      str[1] = (WCHAR)x;
      memcpy(lwr, str, str_len * sizeof(WCHAR));
      lwr[2] = 0;
      CharLowerW(lwr);
      int ret = StrCmpIW(lwr, str);
      if (ret) {
        WCHAR buf[str_len * 2 + 64];
        wcscpy(buf, str);
        wcscat(buf, L" / ");
        wcscat(buf, lwr);
        wcscat(buf, L"\n");
        WriteFile(hFile, buf, (DWORD)wcslen(buf) * sizeof(WCHAR), &dw, NULL);
      }
    }
  }
  wcscpy(str, L"<<<EOF>>>");
  wprintf(str);
  WriteFile(hFile, str, (DWORD)wcslen(str) * sizeof(WCHAR), &dw, NULL);
  return 0;
}
According to the test results, it turned out that only "Ʀ"/"ʀ" is the problem symbol.
Unicode:

Re: [Bug] Incorrect comparison of two unicode strings

Posted: 2020-01-24, 17:52 UTC
by ghisler(Author)
In my tests on Windows 10, StrCmpIW didn't cause any measurable speedup of the archive read functions (at least of the initial read).