New content plugin cputil (chars from different codepage)

Discuss and announce Total Commander plugins, addons and other useful tools here, both their usage and their development.

Moderators: sheep, Hacker, Stefan2, white

User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 38160
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

New content plugin cputil (chars from different codepage)

Post by *ghisler(Author) » 2009-06-08, 15:50 UTC

Hi!

I have written a new small content plugin for TC 7.5 which allows to find file names with characters from a different code page, and rename them if needed.

You can download it here:
wdx_cputil.zip

and the source:
wdx_cputil_src.zip

It can be used like any other content plugin, e.g. in the search function, custom columns, hint fields, etc.

The fields:
  • hasunicode: true if name contains Unicode characters from a different codepage. Mainly useful for searching.
  • namenounicode: the full name (without path) with Unicode characters replaced. If possible, accented characters are replaced by their non-accented counterparts. Other characters are replaced by underscores '_'. Mainly useful for multi-rename tool.
This plugin is useful only on NT-based systems (Windows NT, 2000, XP, Vista, 7) and requires Total Commander 7.5 beta.
Author of Total Commander
http://www.ghisler.com

User avatar
fenix_productions
Power Member
Power Member
Posts: 1956
Joined: 2005-08-07, 13:23 UTC
Location: Poland
Contact:

Post by *fenix_productions » 2009-06-08, 17:11 UTC

2ghisler(Author)
Suggestion: configurable characters replacement.

Currently when I have files with cyrylic characters names all of them are shown as "one big underscore". Same thing happens for Japanese ones.

It should help to use configuration file with the content like:

Code: Select all

м=m
ж=z
п=p
р=r
у=u
か=ka
き=ki
よ=yo
...
This plugin is useful for languages with latin based alphabets but not "glyph ones". IMHO whole "Unicode stuff" is more important for the second group so it should be supported better.
Last edited by fenix_productions on 2009-06-08, 17:15 UTC, edited 1 time in total.
"When we created the poke, we thought it would be cool to have a feature without any specific purpose." Facebook...

#128099

Postkutscher
Power Member
Power Member
Posts: 556
Joined: 2006-04-01, 00:11 UTC

Post by *Postkutscher » 2009-06-08, 17:14 UTC

2ghisler(Author)
Thank you. It might be very useful.

And I support fenix_productions`s suggestions.

User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 38160
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) » 2009-06-09, 09:44 UTC

2fenix_productions
Unfortunately it's not that simple. For example, the phonetic spelling of Chinese characters depends on the language - the same character is spoken differently in China and Japan. Furthermore, most characters have multiple spellings both in Japanese (kun/on, e.g. 山 = Yama/San) and Chinese PinYin.

Any suggestions?
Author of Total Commander
http://www.ghisler.com

User avatar
fenix_productions
Power Member
Power Member
Posts: 1956
Joined: 2005-08-07, 13:23 UTC
Location: Poland
Contact:

Post by *fenix_productions » 2009-06-09, 14:03 UTC

2ghisler(Author)
I am sorry I don't have solution for multiple spellings problem. I am not using glyphs on daily basis so it is hard to me to figure out something.

I was also aware about it when I wrote my proposal* but I still think that giving the possibility to show something more than underscore is better. After all, you can give _ when no replacement is available. It would be up to user to decide what is the best for him. I simply saw your plugin as replacement for Translit

I also know that in MRT I can save my own patterns for all characters replacements but it would be easier to have plugin to do so.

*) I saw つ (tsu) character problem:
- small tsu (called a sokuon) is used to double consonant (っか - kka),
- standard tsu is used as normal letter (つか - tsuka).

First case is unsolvable with single characters replacements.
"When we created the poke, we thought it would be cool to have a feature without any specific purpose." Facebook...

#128099

Postkutscher
Power Member
Power Member
Posts: 556
Joined: 2006-04-01, 00:11 UTC

Post by *Postkutscher » 2009-06-09, 17:10 UTC

2ghisler(Author)
Maybe use transliteration only in in single-valued cases with respect of contry selection in the OS then? And the rest will be underscores ?
fenix_productions wrote:I simply saw your plugin as replacement for Translit
Yes, is my idea too.
At first I thought these two plugins would complement each other, but after I saw your suggestion about transliteration I`v decided that wdx_cputil can replace translit.wdx completely.

User avatar
fenix_productions
Power Member
Power Member
Posts: 1956
Joined: 2005-08-07, 13:23 UTC
Location: Poland
Contact:

Post by *fenix_productions » 2009-06-09, 18:00 UTC

2Postkutscher
I thought about Translit replacement because of two reasons:
- 3rd party plugins are sometimes not updated for a very long time,
- it's not Unicode - so you should either have Russian locale set in Windows or use AppLocale tool from Microsoft.
"When we created the poke, we thought it would be cool to have a feature without any specific purpose." Facebook...

#128099

User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 38160
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) » 2009-06-10, 15:21 UTC

Indeed the two plugins cputil and translit are actually quite different. cputil is meant to detect and replace characters from a different codepage, while translit is meant to replace characters withing the same codepage.

Therefore I think that there should be at least two different substitution fields:
1. namenounicode: Replaces characters from different codepage by user-defined characters or underscores
2. namereplacenoenglish: Replace all non-english characters by user-defined or undercores
and maybe also
3. namereplaceuser: Replace only user-defined characters by others

What do you think?
Author of Total Commander
http://www.ghisler.com

Postkutscher
Power Member
Power Member
Posts: 556
Joined: 2006-04-01, 00:11 UTC

Post by *Postkutscher » 2009-06-10, 16:25 UTC

ghisler(Author) wrote:Therefore I think that there should be at least two different substitution fields:
1. namenounicode: Replaces characters from different codepage by user-defined characters or underscores
2. namereplacenoenglish: Replace all non-english characters by user-defined or undercores
and maybe also
3. namereplaceuser: Replace only user-defined characters by others

What do you think?
Must not be "two different substitution fields" a "three different substitution fields"? Typo?

Sounds absolutely consist for me.

++

User avatar
fenix_productions
Power Member
Power Member
Posts: 1956
Joined: 2005-08-07, 13:23 UTC
Location: Poland
Contact:

Post by *fenix_productions » 2009-06-10, 17:27 UTC

2ghisler(Author)
Those three points sound good for me :)
"When we created the poke, we thought it would be cool to have a feature without any specific purpose." Facebook...

#128099

User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 38160
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) » 2009-06-14, 18:27 UTC

Version 1.1 of my plugin is available now! I tried to implement the above suggestions. Currently the plugin contains limited tables of German accents, Cyrillic, Greek, Hebrew, Arabic, Japanese full and half width Katakana, and Hiragana.

To support the latter, the plugin supports not only a one word (2 bytes) per character table, but also a 2-3 words (4-6 bytes) per character table. These two tables (cputil1.tbl and cputil2.tbl) are strictly separated for efficiency reasons.

Now I need some help from the native speakers of these languages: These tables cover only the basics, e.g. no vovel dots in Arabic, no hamza etc. The tables are actually plain text Unicode (UTF-16), so you can edit them with notepad.

You can either post your additions/corrections here in the forum (which supports Unicode), or send them to me by e-mail in zipped form. Thanks!
Author of Total Commander
http://www.ghisler.com

User avatar
fenix_productions
Power Member
Power Member
Posts: 1956
Joined: 2005-08-07, 13:23 UTC
Location: Poland
Contact:

Post by *fenix_productions » 2009-06-14, 19:00 UTC

2ghisler(Author)
Nicely done!
Polish
ą=a
ć=c
ę=e
ł=l
ń=n
ó=o
ś=s
ź=z
ż=z
Ą=A
Ć=C
Ę=E
Ł=L
Ń=N
Ó=O
Ś=S
Ź=Z
Ż=Z
"When we created the poke, we thought it would be cool to have a feature without any specific purpose." Facebook...

#128099

User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 38160
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) » 2009-06-14, 19:24 UTC

2fenix_productions
These should already be handled by MultiByteToWideChar, at least they work here! Just try this:
[=cputil.namereplacenoenglish]

Characters need to be added to cputil1.tbl only if you want a result _different_ from what Windows does by itself. For example, the German 'ä' is replaced by a simple 'a', but it's actually better to write it as 'ae'.
Author of Total Commander
http://www.ghisler.com

User avatar
fenix_productions
Power Member
Power Member
Posts: 1956
Joined: 2005-08-07, 13:23 UTC
Location: Poland
Contact:

Post by *fenix_productions » 2009-06-14, 19:31 UTC

2ghisler(Author)
I saw is handled this way on my PC but I don't know will it also work i.e. on Chinese ones.
"When we created the poke, we thought it would be cool to have a feature without any specific purpose." Facebook...

#128099

User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 38160
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) » 2009-06-14, 19:38 UTC

Yes it will, as long as the locale support for Central Europe is installed (which it usually is)!
Author of Total Commander
http://www.ghisler.com

Post Reply