registrationsucks 
July 08, 2008 10:03AM

<p> This should work: <pre><![CDATA[ <a href="]]>http://żółw.pl/<![CDATA[">test</a> ]]></pre> <p> but the filter gets seriously confused with this. <p>For added awesomness you could convert IDN to punycode.

<p>BTW: the forum seems to have problems with UTF-8. The URL above, without workaround, is displayed as http://?รณ?w.pl/

July 08, 2008 08:39PM

Hello, you're the very first person to complain about the lack of IDN support. :-) We dropped it a while back to tighten up our domain-name validation algorithms, and never got around to implementing it properly.

As of right now, with work and such, I don't have the time to properly implement a feature like this (I could hack something out, but since you want IDNs you probably want them correct.) I can, however, point you in the right direction, and maybe perhaps we'll get a usable patch out of this.

HTMLPurifier/AttrDef/URI/Host.php is the culprit, and uses the URI specification to define valid host names. The appropriate RFC is RFC 3490, but it's not immediately clear what a substitute production would be, or what valid characters would be. Percent-encoded characters inside the host must also be accounted for; HTML Purifier may do funky things to them.

I consider implementations for ToAscii and ToUnicode essential for a complete implementation; basically, HTML Purifier must as a default output ASCII URIs. This is to ensure compatibility with Internet Explorer 6, which does not natively support IDNs. In addition, the Punycode form is what most browsers display in the window. %AutoFormat.Linkify will also need to be updated appropriately.

As for the forum messing up UTF-8, I will certainly take a look.

