Just found the code in my sources:myfreexp wrote:As I said in my initial bug post, there are clearly defined rules what is a valid UTF-8 byte sequence and what not. Based on these rules, I once wrote such a function myself years ago, and I'm happy to contribute it - please advise. You would just need to translate it from Turbo Pascal to Delphi.
Code: Select all
{ my: Prüfung auf gültiges UTF-8 gemäß "The Unicode Standard, Version 4.0" }
{ (siehe <http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf>, 3.9) }
{ und RFC3629 (November 2003, supersedes RFC2279) }
function valid_UTF8(const cvt:string):boolean;
var ok : boolean;
sp : byte;
begin
ok:=byte(cvt[1]) in [$c2..$f4];
sp:=1;
while ok and (sp<length(cvt)) do
begin
inc(sp);
if sp=2 then { 2. Octet prüfen! }
begin
case byte(cvt[1]) of
$e0 : ok:=byte(cvt[sp]) in [$a0..$bf]; { 3-Octet-Sequenzen mit E0 }
$ed : ok:=byte(cvt[sp]) in [$80..$9f]; { 3-Octet-Sequenzen mit ED }
$f0 : ok:=byte(cvt[sp]) in [$90..$bf]; { 4-Octet-Sequenzen mit F0 }
$f4 : ok:=byte(cvt[sp]) in [$80..$8f]; { 4-Octet-Sequenzen mit F4 }
else
ok:=byte(cvt[sp]) in [$80..$bf]; { alle übrigen Sequenzen }
end;
end
else ok:=byte(cvt[sp]) in [$80..$bf]; { 3.-4. Octet prüfen }
end;
valid_UTF8:=ok;
end;