[BUG] Lister: Automatic detection of OEM charset
Moderators: Hacker, petermad, Stefan2, white
[BUG] Lister: Automatic detection of OEM charset
Well, to make it short:
It simply doesn't work!
No matter which NFO/ASCII art files I am displaying in lister, lister always starts in ANSI mode.
What exactly is that function trying to do on autodetection?
Icfu
It simply doesn't work!
No matter which NFO/ASCII art files I am displaying in lister, lister always starts in ANSI mode.
What exactly is that function trying to do on autodetection?
Icfu
This account is for sale
- XPEHOPE3KA
- Power Member
- Posts: 854
- Joined: 2006-03-03, 18:23 UTC
- Location: Saint-Petersburg, Russia
Yes, it is confirmed for me and for many people also (i think)
http://ghisler.ch/board/viewtopic.php?p=87726&highlight=#87726
http://ghisler.ch/board/viewtopic.php?t=4033#30659

http://ghisler.ch/board/viewtopic.php?p=87726&highlight=#87726
http://ghisler.ch/board/viewtopic.php?t=4033#30659
...and diz and bla, blaNo matter which NFO/ASCII..

I can read English, but... I write like Tarzan. (sorry)
Id does not answer your question, but you could try with http://www.totalcmd.net/plugring/nfoview.html
Be aware, there is one BUG, if you select text and drag it to the TC, it throws and exception.
Be aware, there is one BUG, if you select text and drag it to the TC, it throws and exception.
Ambiguity succeeds where honesty dares not venture.
- ghisler(Author)
- Site Admin
- Posts: 50824
- Joined: 2003-02-04, 09:46 UTC
- Location: Switzerland
- Contact:
I'm sorry but real ANSI/OEM detection in all languages is simply not possible/reliable. Lister tries to do what it can, but it is far from optimum.
Author of Total Commander
https://www.ghisler.com
https://www.ghisler.com
I repeat my question:Lister tries to do what it can, but it is far from optimum.
What exactly is that function trying to do on autodetection?
"Far from optimum" sounds rather euphemistic regarding that I have never found a single file in the past years that would have been properly detected.
You could autodetect REPETITION of characters for example, #248 (dec) is the black square which is very specific for NFO files, or you could make it possible to define specific file types for which lister is started with OEM font, like NFO.
Icfu
This account is for sale
- XPEHOPE3KA
- Power Member
- Posts: 854
- Joined: 2006-03-03, 18:23 UTC
- Location: Saint-Petersburg, Russia
Indeed---
2icfu
Hi Jeff !
• Is it necessary that I add my confirmation? Indeed, that never worked …
- However, maybe for several specific characters, not only the #248 (which is normally used a lot in Danish with ANSI : ø) and like XPEHOPE3KA is wondering about, according to the current language stated in the <wincmd.ini>…
…that is not absolutely perfect :
- In example, this "ø" doesn't exist in French and many other languages,
but sometimes it can be used as an abbreviation for "diameter" ( I do, and with #216 too…).
Right ! In French, this is amusingly so-called «A sweet euphemism» 
VG
Claude
Clo

• Is it necessary that I add my confirmation? Indeed, that never worked …
- I thought to something in that painting…You could autodetect REPETITION of characters for example, #248 (dec)
- However, maybe for several specific characters, not only the #248 (which is normally used a lot in Danish with ANSI : ø) and like XPEHOPE3KA is wondering about, according to the current language stated in the <wincmd.ini>…
…that is not absolutely perfect :
- In example, this "ø" doesn't exist in French and many other languages,
but sometimes it can be used as an abbreviation for "diameter" ( I do, and with #216 too…).
…Far from optimum" sounds rather euphemistic…



Claude
Clo
Last edited by Clo on 2006-07-14, 10:09 UTC, edited 3 times in total.
#31505 Traducteur Français de T•C French translator Aide en Français Tutoriels Français English Tutorials
- ghisler(Author)
- Site Admin
- Posts: 50824
- Joined: 2003-02-04, 09:46 UTC
- Location: Switzerland
- Contact:
Exchange ideas ?
2ghisler(Author)
Good afternoon,
Personally, "0" or almost at programming !
• But just an idea : Alextp-friend has to work for the same problem, like he said on his To-Do list,
about his new alternative Lister viewer tool…
Maybe could you exchange ideas ?
VG
Claude
Clo

So do you know of any good detection routines?

• But just an idea : Alextp-friend has to work for the same problem, like he said on his To-Do list,
about his new alternative Lister viewer tool…
Maybe could you exchange ideas ?


Claude
Clo
#31505 Traducteur Français de T•C French translator Aide en Français Tutoriels Français English Tutorials
@Clo and interested developers:

I think that only with that specific character appearance you can catch about 99% of all NFO files correctly. This is about a 10000000000% increase over the present situation, roughly estimated of course.
Icfu
It IS perfect when, like I have proposed, you are using REPETITION of characters. Are there any Danish words containing "øøø" for example?- However, maybe for several specific characters, not only the #248 (which is normally used a lot in Danish with ANSI : ø) and like XPEHOPE3KA is wondering about, according to the current language stated in the <wincmd.ini>…
…that is not absolutely perfect :]

I think that only with that specific character appearance you can catch about 99% of all NFO files correctly. This is about a 10000000000% increase over the present situation, roughly estimated of course.
Icfu
This account is for sale
- XPEHOPE3KA
- Power Member
- Posts: 854
- Joined: 2006-03-03, 18:23 UTC
- Location: Saint-Petersburg, Russia
1. Heh, it (the current algorythm) seldom works.
2. But only for files WITHOUT pseudographics characters.
3. And only for files which begin from a sequence of codes > 128. However, much twolingual readmes have English beginning, but end in Russian (may be in OEM).
So, Christian, you need to:
1. Analyze both the beginning & the end of files - for accuracy of prediction.
2. Leave almost unchanged the current algorythm, but insert there some code about repetition of pseudographics characters. This prediction should be done only for "horizontal" characters. Such characters as (this is ONE group) ─,┌,┐,└,┘,├,┤,┬,┴,┼,╓,╖,╙,╜,╥,╨,╫,╟,╢ should be nearby to each other in a string. So these: ═,╒,╔,╕,╗,╘,╚,╛,╝,╞,╠,╡,╣,╤,╦,╧,╩,╬,╪ are the second group. The other pseudographics characters should be followed by themselves like ▒▒▒.
2. But only for files WITHOUT pseudographics characters.
3. And only for files which begin from a sequence of codes > 128. However, much twolingual readmes have English beginning, but end in Russian (may be in OEM).
So, Christian, you need to:
1. Analyze both the beginning & the end of files - for accuracy of prediction.
2. Leave almost unchanged the current algorythm, but insert there some code about repetition of pseudographics characters. This prediction should be done only for "horizontal" characters. Such characters as (this is ONE group) ─,┌,┐,└,┘,├,┤,┬,┴,┼,╓,╖,╙,╜,╥,╨,╫,╟,╢ should be nearby to each other in a string. So these: ═,╒,╔,╕,╗,╘,╚,╛,╝,╞,╠,╡,╣,╤,╦,╧,╩,╬,╪ are the second group. The other pseudographics characters should be followed by themselves like ▒▒▒.
If a stammerer…
2icfu
I don't know, but the writer can be a stammerer … 
- Seriously : Certainly, there is a way around this to improve that feature.
Anyway, it was a nail which needed to be hammered again !
VG
Claude
Clo
Are there any Danish words containing "øøø" for example?![]()


- Seriously : Certainly, there is a way around this to improve that feature.
Anyway, it was a nail which needed to be hammered again !


Claude
Clo
#31505 Traducteur Français de T•C French translator Aide en Français Tutoriels Français English Tutorials
- AlleyKat
- Senior Member
- Posts: 203
- Joined: 2003-06-15, 10:51 UTC
- Location: for personal info, see wiki
- Contact:
I admittedly often do write 'øøøh' and 'æææh' along with 'hmm' on my own forum...
nah just kidding, well almost anyway...
I don't see the big issue having to press s (or whatever the right shortcut is), but honestly feel that utf-8 problems in Compare are worse? I dunno...

I don't see the big issue having to press s (or whatever the right shortcut is), but honestly feel that utf-8 problems in Compare are worse? I dunno...
Translate your favorite Mozilla Extension ~ Your Language Is Important Too.
#tcmd on irc.freenode.net - the place to idle
#tcmd on irc.freenode.net - the place to idle
My suggestion of auto-detection.
Works good with all NFO files.
This may be used in the new version of ATViewer.
Works good with all NFO files.
This may be used in the new version of ATViewer.
Code: Select all
{
The idea of detection is the following:
NFO files contain pseudo-graphic that is chars between $B0 and $DF.
And really only 5-12 chars from this interval are used,
so the char with frequency of at last 15-20% is exist.
Normal ANSI texts will not be detected, because maximal frequency of text
letters is about 10-12% (and most letters are outside of interval $B0-$DF).
}
type
TOemFreqTable = array[$80..$FF] of integer;
function IsTableOem(const Table: TOemFreqTable): boolean;
var
i: integer;
begin
Result:= false;
for i:= $B0 to $DF do
if Table[i]>=20 then
begin Result:= true; Break end;
end;
function CalcTable(const fn: string; var Table: TOemFreqTable): boolean;
var
h: THandle;
Buffer: string;
FSize, BytesRead: DWORD;
i, n: integer;
TableSize: integer;
begin
Result:= false;
TableSize:= 0;
FillChar(Table, SizeOf(Table), 0);
h:= FFileOpen(fn);
if h=INVALID_HANDLE_VALUE then Exit;
try
FSize:= FFileSize(fn);
SetLength(Buffer, FSize);
if not ReadFile(h, Buffer[1], FSize, BytesRead, nil) then Exit;
for i:= 1 to BytesRead do
begin
n:= Ord(Buffer[i]);
if (n>=Low(Table)) and (n<=High(Table)) then
begin
Inc(TableSize);
Inc(Table[n]);
end;
end;
finally
CloseHandle(h);
end;
for i:= Low(Table) to High(Table) do
begin
if TableSize=0
then Table[i]:= 0
else Table[i]:= Table[i]*100 div TableSize;
end;
Result:= true;
end;
procedure WriteTable(const Table: TOemFreqTable);
var
i: integer;
begin
for i:= Low(Table) to High(Table) do
if Table[i]<>0 then
Writeln('"', Chr(i), '": ', Table[i]);
end;