[BUG] Lister: Automatic detection of OEM charset

English support forum

Moderators: Hacker, petermad, Stefan2, white

icfu
Power Member
Power Member
Posts: 6052
Joined: 2003-09-10, 18:33 UTC

[BUG] Lister: Automatic detection of OEM charset

Post by *icfu »

Well, to make it short:
It simply doesn't work!

No matter which NFO/ASCII art files I am displaying in lister, lister always starts in ANSI mode.

What exactly is that function trying to do on autodetection?

Icfu
This account is for sale
User avatar
XPEHOPE3KA
Power Member
Power Member
Posts: 854
Joined: 2006-03-03, 18:23 UTC
Location: Saint-Petersburg, Russia

Post by *XPEHOPE3KA »

I confirm it & I confirm that it never worked at my place.
F6, Enter, Tab, F6, Enter, Tab, F6, Enter, Tab... - I like to move IT, move IT!..
User avatar
Sombra
Power Member
Power Member
Posts: 814
Joined: 2005-12-27, 22:23 UTC
Location: Zaragoza, Spain

Post by *Sombra »

Yes, it is confirmed for me and for many people also (i think) :roll:

http://ghisler.ch/board/viewtopic.php?p=87726&highlight=#87726
http://ghisler.ch/board/viewtopic.php?t=4033#30659
No matter which NFO/ASCII..
...and diz and bla, bla :D
I can read English, but... I write like Tarzan. (sorry)
User avatar
frenky
Senior Member
Senior Member
Posts: 250
Joined: 2005-07-30, 19:36 UTC

Post by *frenky »

Id does not answer your question, but you could try with http://www.totalcmd.net/plugring/nfoview.html

Be aware, there is one BUG, if you select text and drag it to the TC, it throws and exception.
Ambiguity succeeds where honesty dares not venture.
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 50824
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

I'm sorry but real ANSI/OEM detection in all languages is simply not possible/reliable. Lister tries to do what it can, but it is far from optimum.
Author of Total Commander
https://www.ghisler.com
icfu
Power Member
Power Member
Posts: 6052
Joined: 2003-09-10, 18:33 UTC

Post by *icfu »

Lister tries to do what it can, but it is far from optimum.
I repeat my question:
What exactly is that function trying to do on autodetection?

"Far from optimum" sounds rather euphemistic regarding that I have never found a single file in the past years that would have been properly detected.

You could autodetect REPETITION of characters for example, #248 (dec) is the black square which is very specific for NFO files, or you could make it possible to define specific file types for which lister is started with OEM font, like NFO.

Icfu
This account is for sale
User avatar
XPEHOPE3KA
Power Member
Power Member
Posts: 854
Joined: 2006-03-03, 18:23 UTC
Location: Saint-Petersburg, Russia

Post by *XPEHOPE3KA »

2ghisler(Author)
Is your algorythm language-dependant?

Can't it be that the algorythm became a deadcode in your code one happy day? :roll:
F6, Enter, Tab, F6, Enter, Tab, F6, Enter, Tab... - I like to move IT, move IT!..
User avatar
Clo
Moderator
Moderator
Posts: 5731
Joined: 2003-12-02, 19:01 UTC
Location: Bordeaux, France
Contact:

Indeed---

Post by *Clo »

2icfu

:) Hi Jeff !

• Is it necessary that I add my confirmation? Indeed, that never worked …
You could autodetect REPETITION of characters for example, #248 (dec)
- I thought to something in that painting…
- However, maybe for several specific characters, not only the #248 (which is normally used a lot in Danish with ANSI : ø) and like XPEHOPE3KA is wondering about, according to the current language stated in the <wincmd.ini>…
…that is not absolutely perfect :
- In example, this "ø" doesn't exist in French and many other languages,
but sometimes it can be used as an abbreviation for "diameter" ( I do, and with #216 too…).
…Far from optimum" sounds rather euphemistic…
:D Right ! In French, this is amusingly so-called «A sweet euphemism» :lol:

:mrgreen: VG
Claude
Clo
Last edited by Clo on 2006-07-14, 10:09 UTC, edited 3 times in total.
#31505 Traducteur Français de TC French translator Aide en Français Tutoriels Français English Tutorials
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 50824
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

So do you know of any good detection routines?
Author of Total Commander
https://www.ghisler.com
User avatar
Clo
Moderator
Moderator
Posts: 5731
Joined: 2003-12-02, 19:01 UTC
Location: Bordeaux, France
Contact:

Exchange ideas ?

Post by *Clo »

2ghisler(Author)

:) Good afternoon,
So do you know of any good detection routines?
:? Personally, "0" or almost at programming !

• But just an idea : Alextp-friend has to work for the same problem, like he said on his To-Do list,
about his new alternative Lister viewer tool…
Maybe could you exchange ideas ? :roll:

:mrgreen:  VG
Claude
Clo
#31505 Traducteur Français de TC French translator Aide en Français Tutoriels Français English Tutorials
icfu
Power Member
Power Member
Posts: 6052
Joined: 2003-09-10, 18:33 UTC

Post by *icfu »

@Clo and interested developers:
- However, maybe for several specific characters, not only the #248 (which is normally used a lot in Danish with ANSI : ø) and like XPEHOPE3KA is wondering about, according to the current language stated in the <wincmd.ini>…
…that is not absolutely perfect :]
It IS perfect when, like I have proposed, you are using REPETITION of characters. Are there any Danish words containing "øøø" for example? ;)

I think that only with that specific character appearance you can catch about 99% of all NFO files correctly. This is about a 10000000000% increase over the present situation, roughly estimated of course.

Icfu
This account is for sale
User avatar
XPEHOPE3KA
Power Member
Power Member
Posts: 854
Joined: 2006-03-03, 18:23 UTC
Location: Saint-Petersburg, Russia

Post by *XPEHOPE3KA »

1. Heh, it (the current algorythm) seldom works.
2. But only for files WITHOUT pseudographics characters.
3. And only for files which begin from a sequence of codes > 128. However, much twolingual readmes have English beginning, but end in Russian (may be in OEM).

So, Christian, you need to:
1. Analyze both the beginning & the end of files - for accuracy of prediction.
2. Leave almost unchanged the current algorythm, but insert there some code about repetition of pseudographics characters. This prediction should be done only for "horizontal" characters. Such characters as (this is ONE group) ─,┌,┐,└,┘,├,┤,┬,┴,┼,╓,╖,╙,╜,╥,╨,╫,╟,╢ should be nearby to each other in a string. So these: ═,╒,╔,╕,╗,╘,╚,╛,╝,╞,╠,╡,╣,╤,╦,╧,╩,╬,╪ are the second group. The other pseudographics characters should be followed by themselves like ▒▒▒.
User avatar
Clo
Moderator
Moderator
Posts: 5731
Joined: 2003-12-02, 19:01 UTC
Location: Bordeaux, France
Contact:

If a stammerer…

Post by *Clo »

2icfu
Are there any Danish words containing "øøø" for example? ;)
:D I don't know, but the writer can be a stammerer … :lol:

- Seriously : Certainly, there is a way around this to improve that feature.
Anyway, it was a nail which needed to be hammered again ! :P

:mrgreen: VG
Claude
Clo
#31505 Traducteur Français de TC French translator Aide en Français Tutoriels Français English Tutorials
User avatar
AlleyKat
Senior Member
Senior Member
Posts: 203
Joined: 2003-06-15, 10:51 UTC
Location: for personal info, see wiki
Contact:

Post by *AlleyKat »

I admittedly often do write 'øøøh' and 'æææh' along with 'hmm' on my own forum... :P nah just kidding, well almost anyway...

I don't see the big issue having to press s (or whatever the right shortcut is), but honestly feel that utf-8 problems in Compare are worse? I dunno...
Translate your favorite Mozilla Extension ~ Your Language Is Important Too.
#tcmd on irc.freenode.net - the place to idle
User avatar
Alextp
Power Member
Power Member
Posts: 2321
Joined: 2004-08-16, 22:35 UTC
Location: Russian Federation
Contact:

Post by *Alextp »

My suggestion of auto-detection.
Works good with all NFO files.
This may be used in the new version of ATViewer.

Code: Select all

{
The idea of detection is the following:
NFO files contain pseudo-graphic that is chars between $B0 and $DF.
And really only 5-12 chars from this interval are used,
so the char with frequency of at last 15-20% is exist.
Normal ANSI texts will not be detected, because maximal frequency of text
letters is about 10-12% (and most letters are outside of interval $B0-$DF).
}

type
  TOemFreqTable = array[$80..$FF] of integer;

function IsTableOem(const Table: TOemFreqTable): boolean;
var
  i: integer;
begin
  Result:= false;
  for i:= $B0 to $DF do
    if Table[i]>=20 then
      begin Result:= true; Break end;
end;

function CalcTable(const fn: string; var Table: TOemFreqTable): boolean;
var
  h: THandle;
  Buffer: string;
  FSize, BytesRead: DWORD;
  i, n: integer;
  TableSize: integer;
begin
  Result:= false;
  TableSize:= 0;
  FillChar(Table, SizeOf(Table), 0);

  h:= FFileOpen(fn);
  if h=INVALID_HANDLE_VALUE then Exit;

  try
    FSize:= FFileSize(fn);
    SetLength(Buffer, FSize);
    if not ReadFile(h, Buffer[1], FSize, BytesRead, nil) then Exit;

    for i:= 1 to BytesRead do
      begin
      n:= Ord(Buffer[i]);
      if (n>=Low(Table)) and (n<=High(Table)) then
        begin
        Inc(TableSize);
        Inc(Table[n]);
        end;
      end;
  finally
    CloseHandle(h);
  end;

  for i:= Low(Table) to High(Table) do
    begin
    if TableSize=0
      then Table[i]:= 0
      else Table[i]:= Table[i]*100 div TableSize;
    end;

  Result:= true;
end;

procedure WriteTable(const Table: TOemFreqTable);
var
  i: integer;
begin
  for i:= Low(Table) to High(Table) do
    if Table[i]<>0 then
      Writeln('"', Chr(i), '": ', Table[i]);
end;
Post Reply