Using Unicode directional formatting characters in file names

English support forum

Moderators: white, Hacker, petermad, Stefan2

Post Reply
misvin
Member
Member
Posts: 112
Joined: 2010-08-14, 11:25 UTC

Using Unicode directional formatting characters in file names

Post by *misvin »

The Unicode standard is constantly being updated. The latest version 13.0 was released in March 2020. Please note that often in new versions of Unicode standard not only new characters are added, but also changes occur in various algorithms that are an integral part of the standard itself.

I want to get answers to the following questions:
1. What version of Unicode standard is used in Windows 10 21H1 and Total Commander 10.0?
2. Does Total Commander 10.0 in Windows 10 21H1 support Explicit Directional Isolate Formatting Characters: LRI, RLI, FSI, PDI, which appeared in version 6.3 of the Unicode standard?

Unicode Bidirectional Algorithm recommends to use the Explicit Directional Isolate Formatting Characters (LRI, RLI, FSI, PDI) instead of Explicit Directional Embedding and Override Formatting Characters (LRE, RLE, LRO, RLO, PDF):
Although the term embedding is used for some explicit formatting characters, the text within the scope of the embedding formatting characters is not independent of the surrounding text. Characters within an embedding can affect the ordering of characters outside, and vice versa. This is not the case with the isolate formatting characters, however. Characters within an isolate cannot affect the ordering of characters outside it, or vice versa.
Explicit Directional Overrides
The following characters allow the bidirectional character types to be overridden when required for special cases, such as for part numbers. They are to be avoided wherever possible, because of security concerns.
Sometimes I use bidirectional file names: a mixture of right-to-left (Hebrew) and left-to-right text (English).
I have no problems with using of non-recommended Explicit Directional Embedding and Override Formatting Characters (LRE, RLE, LRO, RLO, PDF), but recommended Explicit Directional Isolate Formatting Characters (LRI, RLI, FSI, PDI) don't affect the ordering of characters.

Thanks.
User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Re: Using Unicode directional formatting characters in file names

Post by *milo1012 »

I think that basically Windows 10 is able to deal with Unicode 7.0 when it comes to "algorithms".
https://docs.microsoft.com/en-us/globalization/input/font-support#_Windows_10
And from what I can see over the past years: the only major additions to the Unicode standard after version 7 (besides some added scripts) were more an more emojis with every version, but I don't think that the basic bidi algorithm ("directional rules") changed significantly since then (which is doubtful anyway, since it would break compatibility, and even the change from 6.3 to 7.0 was minor).
I did a basic check for Windows bidi capabilities some time ago, and even with default GDI they were quite "complete", already on Windows 7.

Concerning TC: Christian is probably able to answer this in more detail, but I'm pretty sure that TC does not have any own text render facility itself, but entirely relies on the Windows API functions. I.e. this is either standard GDI or Uniscribe (not sure about DirectWrite on Windows8+). So you can't say that TC follows a specific Unicode standard, but this depends on the API and therefore the Windows version at hand.
At some places TC seems to explicitely use Uniscribe. See e.g. TC's history:
25.02.20 Added: Lister: Use Uniscribe for all languages, even when not needed, by setting wincmd.ini [Lister] Uniscribe=2 (32/64)
However, be aware that TC has some built in checks for potentially dangerous bidi marker combinations in filenames:
https://www.ghisler.ch/board/viewtopic.php?f=15&t=46465
TC plugins: PCREsearch and RegXtract
misvin
Member
Member
Posts: 112
Joined: 2010-08-14, 11:25 UTC

Re: Using Unicode directional formatting characters in file names

Post by *misvin »

2milo1012

Thanks for the info about Script and Font Support in Windows.

However, this document does not contain any information on support for Explicit Directional Isolate Formatting Characters (LRI, RLI, FSI, PDI) which appeared in version 6.3 of the Unicode standard.
Have you tried using these characters in bidirectional file names on Windows 10 21H1?
User avatar
Horst.Epp
Power Member
Power Member
Posts: 6450
Joined: 2003-02-06, 17:36 UTC
Location: Germany

Re: Using Unicode directional formatting characters in file names

Post by *Horst.Epp »

misvin wrote: 2021-08-14, 14:21 UTC 2milo1012

Thanks for the info about Script and Font Support in Windows.

However, this document does not contain any information on support for Explicit Directional Isolate Formatting Characters (LRI, RLI, FSI, PDI) which appeared in version 6.3 of the Unicode standard.
Have you tried using these characters in bidirectional file names on Windows 10 21H1?
Not many users will have the expirience to answer this questions.
If you have a need for it why don't you just try it out ?
Windows 11 Home x64 Version 23H2 (OS Build 22631.3374)
TC 11.03 x64 / x86
Everything 1.5.0.1371a (x64), Everything Toolbar 1.3.2, Listary Pro 6.3.0.69
QAP 11.6.3.2 x64
misvin
Member
Member
Posts: 112
Joined: 2010-08-14, 11:25 UTC

Re: Using Unicode directional formatting characters in file names

Post by *misvin »

Horst.Epp wrote: 2021-08-14, 15:13 UTC If you have a need for it why don't you just try it out ?
I have tried...
In my original post, I have described the problem:
misvin wrote: 2021-08-13, 19:47 UTCI have no problems with using of non-recommended Explicit Directional Embedding and Override Formatting Characters (LRE, RLE, LRO, RLO, PDF), but recommended Explicit Directional Isolate Formatting Characters (LRI, RLI, FSI, PDI) don't affect the ordering of characters.
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48021
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Re: Using Unicode directional formatting characters in file names

Post by *ghisler(Author) »

There are some restrictions to right to left marks in file names, because they are abused by viruses which have extension .exe but use right to left marks to look as they have another extension like jpg.

The restrictions are:
05.05.15 Added: Refuse to open files from file list, button bar, start menu, context menu when the name contains right to left mark (32/64)
05.05.15 Fixed: Remove right to left marks from file names (used almost exclusively by viruses/worms) before displaying them in file lists (32/64)
But there are some exceptions:
21.11.16 Fixed: Allow to open files even if they contain right to left marks, but only if they also contain real right to left text (e.g. Arabic) (32/64)
21.11.16 Fixed: Do not remove right to left marks from file names in list when using aligned extensions (separate column) (32/64)
Author of Total Commander
https://www.ghisler.com
User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Re: Using Unicode directional formatting characters in file names

Post by *milo1012 »

misvin wrote: 2021-08-14, 14:21 UTC Have you tried using these characters in bidirectional file names on Windows 10 21H1?
I guess i didn't try these characters in my former tests, but rather the well-known embedding chars.

But I did a test now on a recent 21H1 VM: seems that these chars are really not recognized in forms of combining them, but instead you see the typical "box character", which is always a sign that a Unicode char is recognized but not "combined". I also did a test with Word 2016 in this VM. According to the docs, starting with Office 2013, Word, Excel et.al. by default use DirectWrite as the main layout/glyph render API (plus it's hardware accelerated). But same here: no combing chars, just the placeholder box.
See: http://wincmd.ru/files/9924355/lri.png

So I guess Windows 10 will never be able to combine by these chars, since I doubt that MS will add these features in Win 10, now that Windows 11 is in sight (BTW, someone could test it on the preview version).
But I'm curious: which OS does support these chars and actually combine them? The current macOS?
TC plugins: PCREsearch and RegXtract
misvin
Member
Member
Posts: 112
Joined: 2010-08-14, 11:25 UTC

Re: Using Unicode directional formatting characters in file names

Post by *misvin »

BTW, according to the results of this test, Explicit Directional Isolate Formatting Characters (LRI, RLI, FSI, PDI) are supported by Chrome, Edge and Firefox.
User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Re: Using Unicode directional formatting characters in file names

Post by *milo1012 »

misvin wrote: 2021-08-22, 06:39 UTC BTW, according to the results of this test, Explicit Directional Isolate Formatting Characters (LRI, RLI, FSI, PDI) are supported by Chrome, Edge and Firefox.
Sure, but this is HTML rendering, i.e. the browser's own glyph rendering engine. And look at the source of this:

Code: Select all

<bdi>القاهرة</bdi>
So it seems we're not talking about these char themselves, but about explicit HTML tags for this isolation. But whatever the case, embedding this in a html file should of course be no problem for modern browsers.
But the operating systems API for layout/glyph rendering, i.e. text boxes: that's a different story. Apples and oranges.

Anyway, I think this topic can be closed when it comes to TC, as it uses the Windows API only, from what I can see.
TC plugins: PCREsearch and RegXtract
Post Reply