A difference betweeen the behavior of iconv (the utility HTML Purifier
uses to transform character encodings) and browsers allowed an attacker
to use the Yen character (5C in Shift_JIS) to trick
HTML Purifier into outputting a byte-sequence most browsers would
interpret as a backslash. This could then be used to execute arbitrary
JavaScript from CSS.
This vulnerability was reported privately to the vendor by Takeshi Terada. No active exploits are currently known.
Fix
This vulnerability was fixed in HTML Purifier 3.1.1 and 2.1.5.
Details
The large majority of character sets in the world are equivalent
to US-ASCII in the 7-bit domain. Shift_JIS (as well as Johab) are
notable exceptions, redefining two byte sequences 5C
and 7E to be different characters. In Shift_JIS:
| Bytes | ASCII | Shift_JIS |
|---|---|---|
| 5C | \ | ¥ |
| 7E | ~ | ‾ |
This is quite exceptional, and puts users of Shift_JIS in a hard place because they have no way of expressing the backslash or tilde legitimately. Consequently, browsers treat the byte sequence as equivalent to a backslash, even if it renders as a Yen.
Iconv, on the other hand, transforms the 5C byte
sequence to Unicode U+00A5 (in UTF-8, this is C2 A5), the
correct character for Yen. This is incorrect behavior, and leads
to the security vulnerability: HTML Purifier thinks that the backslash
is actually a Yen, and does not take any appropriate security
measures. Then, when the Yen is converted back to 5C,
it gains backslash behavior and can be used to break out of a
quoted CSS string. Furthermore, traditionally buggy behavior
will be observed if a backslash is somehow introduced to the
HTML during processing, as iconv does not know how to convert
a backslash in UTF-8 back to a backslash in Shift_JIS (hint: it's
impossible without changing the font).
The fix involves undoing the unnecessary transformation that iconv
performs. HTML Purifier generalizes the fix to all character
encodings with
HTMLPurifier_Encoder->testEncodingSupportsASCII()
by iterating through all printable 7-bit byte sequences and checking
if conversion to UTF-8 causes a change, in which case appropriate
measures should be taken. We do not know of any widely used character
encodings besides Shift_JIS, however, that would be affected by this
behavior.
History
The vulnerability was reported on May 24, 2008 via email, as a follow up to the another unrelated vulnerability in CSS handling. A patch was committed to the public repository on May 25, 2008, with the summary: “Fix Shift_JIS encoding wonkiness with yen symbols and whatnot.” HTML Purifier 3.1.1 was released on June 19, 2008.