File Encoding Detection

English support forum

Moderators: Hacker, petermad, Stefan2, white

Post Reply
raytc
Senior Member
Senior Member
Posts: 274
Joined: 2004-06-28, 11:03 UTC

File Encoding Detection

Post by *raytc »

Is there a (64 bits) plugin, or another way to create a custom column which detects encoding of a file?

p.e. utf-8, ansi/latin1, KOI8-R or whatever encoding.

I would like to know also if a (text) file is a dos or unix file.
mrle
Junior Member
Junior Member
Posts: 58
Joined: 2005-04-25, 21:44 UTC

Post by *mrle »

User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Re: File Encoding Detection

Post by *milo1012 »

raytc wrote:ansi/latin1, KOI8-R or whatever encoding
I assume you're aware of the fact that there is no reliable way to detect the difference between different ANSI code pages and OEM/DOS text?
These existing methods rely on some random statistics and fail quite often.
The only detections that work quite solid are UTF-16 and UTF-8.

You may also use PCREsearch for these encodings, though it doesn't distinguish between OEM/ANSI for the mentioned reason.
raytc wrote:I would like to know also if a (text) file is a dos or unix file.
You mean the different line endings, a.k.a. CRLF / LF ?
You could also use PCREsearch for this, by counting the occurrences of CRLF / LF.
Create two columns with the Reg. Expressions:

Code: Select all

1st: \r\n
2nd: \n
and set regex<N>type to 1 for both columns and now see if:
- the number in the LF columns exceeding CRLF column: the file is probably Unix text (or binary)
- LF and CRLF columns show the same number: file is probably Windows text due to CRLF.
raytc
Senior Member
Senior Member
Posts: 274
Joined: 2004-06-28, 11:03 UTC

Post by *raytc »

Thank you milo1012 and mrle for your reply.

I like EncInfo.

I wish the plugin detected also Latin1 and Ansi 1252 :)
Post Reply