Shift_JIS Full Disclosure

A difference betweeen the behavior of iconv (the utility HTML Purifier uses to transform character encodings) and browsers allowed an attacker to use the Yen character (5C in Shift_JIS) to trick HTML Purifier into outputting a byte-sequence most browsers would interpret as a backslash. This could then be used to execute arbitrary JavaScript from CSS.

This vulnerability was reported privately to the vendor by Takeshi Terada. No active exploits are currently known.

Fix

This vulnerability was fixed in HTML Purifier 3.1.1 and 2.1.5.

Details

The large majority of character sets in the world are equivalent to US-ASCII in the 7-bit domain. Shift_JIS (as well as Johab) are notable exceptions, redefining two byte sequences 5C and 7E to be different characters. In Shift_JIS:

BytesASCIIShift_JIS
5C\¥
7E~

This is quite exceptional, and puts users of Shift_JIS in a hard place because they have no way of expressing the backslash or tilde legitimately. Consequently, browsers treat the byte sequence as equivalent to a backslash, even if it renders as a Yen.

Iconv, on the other hand, transforms the 5C byte sequence to Unicode U+00A5 (in UTF-8, this is C2 A5), the correct character for Yen. This is incorrect behavior, and leads to the security vulnerability: HTML Purifier thinks that the backslash is actually a Yen, and does not take any appropriate security measures. Then, when the Yen is converted back to 5C, it gains backslash behavior and can be used to break out of a quoted CSS string. Furthermore, traditionally buggy behavior will be observed if a backslash is somehow introduced to the HTML during processing, as iconv does not know how to convert a backslash in UTF-8 back to a backslash in Shift_JIS (hint: it's impossible without changing the font).

The fix involves undoing the unnecessary transformation that iconv performs. HTML Purifier generalizes the fix to all character encodings with HTMLPurifier_Encoder->testEncodingSupportsASCII() by iterating through all printable 7-bit byte sequences and checking if conversion to UTF-8 causes a change, in which case appropriate measures should be taken. We do not know of any widely used character encodings besides Shift_JIS, however, that would be affected by this behavior.

History

The vulnerability was reported on May 24, 2008 via email, as a follow up to the another unrelated vulnerability in CSS handling. A patch was committed to the public repository on May 25, 2008, with the summary: “Fix Shift_JIS encoding wonkiness with yen symbols and whatnot.” HTML Purifier 3.1.1 was released on June 19, 2008.