Code: Select all
"This text does not use UTF-8 encoding, it's simple ASCII."
Reason: apparently Lister attempts to determine the encoding using different methods, and besides the two "standard": UTF-8 byte order mark and specific UTF-8 encoding somewhere in text, it also checks the "UTF-8" substring which is widely used in HTML files, something like
content="text/html; charset=utf-8"
While it is good to determine the encoding of HTML code, the existence of "UTF-8" substring near the begin of simple text doesn't make it automatically UTF-8 in common sense. I would suggest to use such heuristic very carefully if the file is a simple text, not HTML or XML.