HTTP Protocol Removal Full Disclosure

An error in the HTMLPurifier_URI->validate() allowed for an attacker craft a specially formed URI that, once processed by HTML Purifier, was an active JavaScript URI. If a user clicked on the malicious link, or used a browser that automatically evaluates JavaScript URIs in image tags, an attacker could execute arbitrary JavaScript in the context of the website the HTML was served on.

This vulnerability was reported via full disclosure by Gareth Heyes, and brought to the attention of the vendor by CrYpTiC_MauleR. No active exploits are currently known.


This vulnerability was fixed in HTML Purifier 3.1.0 and 2.1.4. No hot-patch is currently available.


In accordance to RFC 3986, a relative URI with the same scheme name as the base URI is discouraged, but allowed for backwards-compatibility. As HTML Purifier's goal is to produce standards-compliance in all aspects of its output, HTML Purifier converts such URIs to their correct form by removing the scheme. Thus, http:dir/dir2 becomes dir/dir2.

Doing this bypasses HTML Purifier's safeguards against JavaScript URIs. During the parsing of normal URIs, a URI is parsed and its scheme extracted from the original. Thus, a normal javascript:xss() is identified to have a javascript scheme and is removed. Any of the common bypasses to this, such as java\nscript are avoided because HTML Purifier does not recognize the scheme from its list of allowed schemes. However, once parsing and this initial scheme check is performed, parsing is not performed again.

Removal of the scheme causes a URI like http:javascript:xss() to become javascript:xss(), and now javascript is the new scheme, although in the original, javascript:xss() was the path.


The appropriate fix can be determined by figuring out how to convert the last column into a URI that will be parsed into the same form. Obviously, simple concatenation doesn't work; the key is percent encoding the path. Instead of javascript:xss(), javascript%3Axss() should be used.

HTML Purifier's fix also percent-encodes any other reserved character in each segment of a URI. This was actually a previously identified section of relaxed standards compliance, and strictly enforcing the rules eliminated the vulnerability.


The vulnerability was reported on March 25, 2008, although not directly to the vendor. A patch was committed to the public repository on May 13, 2008, ostensibly as a “revamp [of] URI handling of percent encoding and validation.” HTML Purifier 3.1.0 was released on May 18, 2008. This was the first security vulnerability in HTML Purifier's core, and the second in all of HTML Purifier's history.

We would have strongly preferred if Gareth Heyes had contacted us through private channels before publically disclosing the vulnerability. We actually did not realize that the post was illustrating vulnerabilities with HTML Purifier until CrYpTiC_MauleR asked why the exploit worked on May 13, 2007 (an http:javascript: doesn't actually work by itself; HTML Purifier must munge off the http scheme to activate the attack.) This accounts in part for the large discrepancy between the first disclosure, and the committing of a fix. Still, we greatly appreciate Gareth Heyes' report and sincerely hope that he will continue to help weed out bugs in HTML Purifier. We apologize for not crediting him immediately in the changelog.

Since full disclosure is generally a good idea, just not before the vendor has gotten a chance to release a fix (please don't be afraid to use it to light a fire under our butts and get a security bug fixed), we've released this document along with the next point release of HTML Purifier, hopefully having given projects and end users enough time to upgrade their installations. We hope to do this for all future vulnerabilities in HTML Purifier. Especially for the two which were fixed in the most recent point release.