Unicode characters not saved in Multi-rename Search field

Please report only one bug per message!

Moderators: white, Hacker, petermad, Stefan2

Post Reply
User avatar
Helix751
Senior Member
Senior Member
Posts: 231
Joined: 2004-06-16, 21:16 UTC
Location: Chile

Unicode characters not saved in Multi-rename Search field

Post by *Helix751 »

Hi. Using TC9.51 x32

i frequently use multi-rename tool to normalize lists of files that meet certain criteria and have ofted saved previos searches to carry the same task out in the future.
Today I found that when renaming french characters (file names left with weird "double" character in place of their language-specific equivalents.. eg. trésor instead of trésor, or mystère instead of mystère).

The issue is: I have saved each of the search+replace settings, with eg. "é" in the "Search for" and "é" in the "Replace with" fields in Multi-rename, but when trying to reuse this again and loading (F2) saved settings, the (apparent) "unicode" characters had not been saved verbatim, but rather translated to their corresponding character. That is, The "Search for" field now has an "é" instead of the former "é" string.

I can't recall very well, but it seems this behaviour was different in former versions. I recall the "Search for" combo list having recorded the full combo characters before. Not 100% sure if this though and this is the first time I have tried to save he settings for each special character.

Thanks in advance
Regards,
Sergio

TCmd license #12059
TC11.03x86/x64 | Win11 Pro
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48083
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Re: Unicode characters not saved in Multi-rename Search field

Post by *ghisler(Author) »

Thanks for your report. So far I cannot reproduce it, so I need more information:
1. Which Windows version do you use?
2. Which language encoding do you use? You can see this in Control panel - Regional/Language options. In some Windows versions, it's called "Language for non-Unicode programs", sometimes it's called "locale" or "Regional scheme".
Author of Total Commander
https://www.ghisler.com
User avatar
Helix751
Senior Member
Senior Member
Posts: 231
Joined: 2004-06-16, 21:16 UTC
Location: Chile

Re: Unicode characters not saved in Multi-rename Search field

Post by *Helix751 »

Hi Christian.
Windows 10 Home x64 Spanish
TC 9.51 x32 and x64 (can be reproduced on both)

Windows Language encoding: Español (Chile) - Latinamerica

Created some test files with the troubling characters and the behavior is reproduced. Attached are some screen caps.

Code: Select all

Test french chars-è épreuve é test ê test2.txt
It would be renamed to this:

Code: Select all

Test french chars-è épreuve é test ê test2.txt
List of test files on the main window:
Image: https://imagizer.imageshack.com/img924/7859/pWPmDn.png

Multi-rename tool with drop-down "Search for" combo opened. It shows previous searches already converted to int'l chars, and nothing is matched for change in "New name" column:
Image: https://imagizer.imageshack.com/img921/1466/hi4rin.png

Multi-rename tool. "Search for" with a new special chars search string just pasted (from an external text file). It will not be saved in history, but is actually saved for the current TC session as the active search. Note that chars are matched and substituted in the "New name" column.
Image: https://imagizer.imageshack.com/img921/8903/c18uLC.png

Hope this helps.

EDIT:
You may use my search/replace strings for testing (please note the 2nd sequence has a char that looks like a space, but isn't %20) :

Code: Select all

Search for:  â|à|‚|è|é|ê|î|ô|û
Replace string:  â|à|é|è|é|ê|î|ô|û
Regards,
Sergio

TCmd license #12059
TC11.03x86/x64 | Win11 Pro
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48083
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Re: Unicode characters not saved in Multi-rename Search field

Post by *ghisler(Author) »

Ah, that explains it: Total Commander 9.51 checks whether text in the INI file is in UTF-8, and automatically converts it if it is. To avoid this, you need to open wincmd.ini with notepad (F4) and then save it as Unicode (UTF-16).
Author of Total Commander
https://www.ghisler.com
User avatar
Helix751
Senior Member
Senior Member
Posts: 231
Joined: 2004-06-16, 21:16 UTC
Location: Chile

Re: Unicode characters not saved in Multi-rename Search field

Post by *Helix751 »

Thanks Christian. Is this v9.51 specific or does it come from versions before?

My wincmd.ini config file comes from my 1st installation (under Win 3.11) Wcmd 2.1 if I remember well, without any conversion in between (hadn't even realized this was an issue until now).

Looking at my file, it was indeed still stored/coded as ANSI
Image: https://imagizer.imageshack.com/img924/5462/ur3tEf.png

1st try:
Converted to UTF-8 No BOM: The search/replace fields in Multi-rename tool now show correctly the characters and works for matching searched chars in the files list, but previously saved config names now show "weird" double characters where international accented and ñ chars used to be.
Image: https://imagizer.imageshack.com/img923/6885/zBIOWp.png

2nd try:
Converted to UTF-8 BOM: TC doesn't recognize the file and starts as 1st time execution.

3rd try:
Converted to UTF-16 BE Big Endian: TC doesn't recognize the file and starts as 1st time execution.

4th try:
Converted to UTF-16 LE Little Endian: Both problems are now corrected automagically: Search/replace and saved config names in MRT.
Image: https://imagizer.imageshack.com/img921/6699/lSkvyV.png

Note: All tests were carried out starting with the same ANSI wincmd.ini file, TC not running and starting it up right after the new wincmd.ini was saved.

I can now keep the latter. Thanks for your heads-up, although this issue was rather tricky and somehow obscure.

Suggestion:
How about a specific wincmd.ini option to force Unicode conversion. eg. 0: default; 1: force UTF-8; 2: Force UTF-16, maybe?

Or maybe an automatic conversion by TC of its main config files or a warning at start-up to allowing for this?

Thanks again.
Regards,
Sergio

TCmd license #12059
TC11.03x86/x64 | Win11 Pro
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48083
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Re: Unicode characters not saved in Multi-rename Search field

Post by *ghisler(Author) »

Converted to UTF-16 LE Little Endian
That's the native Windows Unicode format, and the only one supported by Windows for ini files.
Author of Total Commander
https://www.ghisler.com
User avatar
Helix751
Senior Member
Senior Member
Posts: 231
Joined: 2004-06-16, 21:16 UTC
Location: Chile

Re: Unicode characters not saved in Multi-rename Search field

Post by *Helix751 »

Thanks for all the (great) support Christian.

A suggestion then is that the Installer detects the existing wincmd.ini file coding and converts it to the required one. This would be necessary and useful for legacy installations (like mine).
Regards,
Sergio

TCmd license #12059
TC11.03x86/x64 | Win11 Pro
User avatar
Usher
Power Member
Power Member
Posts: 1675
Joined: 2011-03-11, 10:11 UTC

Re: Unicode characters not saved in Multi-rename Search field

Post by *Usher »

2Helix751
The conversion should not be done automatically even in 64-bit Windows 10. Some people use portable installation and run it also under Win9x.
Andrzej P. Wozniak
Polish subforum moderator
donector
Junior Member
Junior Member
Posts: 35
Joined: 2005-09-08, 22:07 UTC

Re: Unicode characters not saved in Multi-rename Search field

Post by *donector »

Hi I have been having a similar problem, it is not totally the same ... but it may be in the same track

background:
  • win10 home
    tc 9.21a
    regional setting fo non unicode apps: Español (Chile) (as well!)
In my case, insteaf of using the config files that 2Helix uses, i just use the "edit name" option; which, as we all know, opens a txt file to edit the filenames (in notepad)..

and if I enter in a given file name for example: "qué" (do note the "é" in there)

i get the crazy ANSI? conversion for é: é

But to make matters worse, I did convert the wincmd.ini into UTF-16 LE (when resaving the ini file using notepad as you mentioned earlier).

(ON A SIDE NOTE: Afterwards, I opened the newly coded wincmd.ini file using notepad++ and it rather says its coding is UCS-2 LE BOM (?).. also notepad++ has no option to code thee file as UTF-16 .... ?x2)

Then restarting TC and returning to the renaming attempt... I got the same bloody initial results!

.. to add to the weirdiness..

using the SAME setup above but in another pc and installations, running win7 pro though, the rename works flawlessly. Aand do note that the wincmd.ini was untouched, IE it remained in ANSI & there was no harm whatsoever after goving some names characters with accents such as é

any clue about this all?

thanks a lot
donector
Junior Member
Junior Member
Posts: 35
Joined: 2005-09-08, 22:07 UTC

Re: Unicode characters not saved in Multi-rename Search field

Post by *donector »

I realise this thread is about v9.5 ...I have just rechecked this issue with version 9.5 and get the same results...
gdpr deleted 6
Power Member
Power Member
Posts: 872
Joined: 2013-09-04, 14:07 UTC

Re: Unicode characters not saved in Multi-rename Search field

Post by *gdpr deleted 6 »

@donector,

Notepad++ supports UCS-2 (which you know). UCS-2 is a subset of UTF-16. Thus, a text file that is UCS-2 is by definition also a UTF-16 text file.

As a side note: Normally, with regard to file/directory path names, you won't get into a situation where you need the parts of UTF-16 that are not covered by UCS-2. If you want to rename a file using some UTF-16 codepoints that are not within the UCS-2 subset, be prepared that sooner or later you will probably experience a variety of software stumbling over such file/directory paths.


With regard to "é" in "qué" appearing as é:

Based on the "é" appearing as "é", i can tell that you saved the text file in Notepad++ as UTF-8 without BOM. TC's multi-rename tool only detects UTF-8 reliably if a UTF-8 BOM is present. Without the BOM, TC will read the text file either based on your system code page or based on some ISO-8859 variant (i am not sure which; but it also doesn't really matter). When saving the file as UTF-8 or UCS-2/UTF-16, make sure you are saving it with BOM. This way TC will have no problems detecting the text encoding.
donector
Junior Member
Junior Member
Posts: 35
Joined: 2005-09-08, 22:07 UTC

Re: Unicode characters not saved in Multi-rename Search field

Post by *donector »

thanks Gonzo, all clear, but please note the temporary txt file used for renaming is not saved by me, as it is a temporary file created automatically by TC ; so then I wonder how to change that behaviour (TV saving the edited file names as UTF-8 without BOM)?
gdpr deleted 6
Power Member
Power Member
Posts: 872
Joined: 2013-09-04, 14:07 UTC

Re: Unicode characters not saved in Multi-rename Search field

Post by *gdpr deleted 6 »

donector wrote: 2020-06-11, 21:07 UTC thanks Gonzo, all clear, but please note the temporary txt file used for renaming is not saved by me, as it is a temporary file created automatically by TC ; so then I wonder how to change that behaviour (TV saving the edited file names as UTF-8 without BOM)?
TC saves the text file either using the system code page or some ISO-8859 variant (i don't know with certainty which would be correct), or in Unicode/UCS-2. If all the file names in the list can be represented in your system code page/ISO-8859, then TC saves the file using the system code page/ISO-8859. Otherwise, it produces a Unicode/UCS-2 file.

You can force the MRT to always save the text file with the file names in UCS-2/Unicode by editing TC's INI file and adding/setting the option "RenameEditUnicode=1" option in the "[Configuration]" section (for more info about the all the possible settings in TC's INI file, open up TC's help. It lists and explains all INI settings...)

(You can also manually convert the text file from within Notepad++; see the "Encodings" menu in Notepad++. But you probably know that already...)

Anyways, lets stop talking about this here in this thread, or lets move this discussion to its own thread. The topic of this thread here is a about a different issue with MRT (related to TC's INI file), and it doesn't make it easier for Ghisler and others wanting to follow up on the original issue discussed here if we are (ab)using this thread to discuss some other thing... ;)
donector
Junior Member
Junior Member
Posts: 35
Joined: 2005-09-08, 22:07 UTC

Re: Unicode characters not saved in Multi-rename Search field

Post by *donector »

elgonzo wrote: 2020-06-11, 22:32 UTC
You can force the MRT to always save the text file with the file names in UCS-2/Unicode by editing TC's INI file and adding/setting the option "RenameEditUnicode=1" option in the "[Configuration]" section
that was it! thanks a lot, luckily no need to exchange more messages (fingers crossed).
I did not have that setting, but once added ...all correct
Post Reply