Welcome! » Log In » Create A New Profile

How do I get HTML Purifier to ignore the content in a pre tag?

Posted by Roly 
How do I get HTML Purifier to ignore the content in a pre tag?
June 21, 2009 07:25PM

Inside a pre tag I sometimes may want to post code that may have html in it. And that html may include tags that I allow on the rest of the post like anchors. So I want HTML purifier to ignore all the content inside it. I'm thinking doing something like

$text = preg_replace_callback(&#039;/<pre>(.*)<\/pre>/ismU&#039;, &#039;store&#039;, $text);

And in the function store I'll store the content of all the pre tags into a global arrray. Then after $text goes through HTML Purifier I'll call a similar function to restore the content of all pre tags. Your input?

Re: How do I get HTML Purifier to ignore the content in a pre tag?
June 22, 2009 11:54AM

If you implement this exactly as you described, you will have a security vulnerability. Assuming that you plan on escaping the data before sticking it back in the pre, you run into a problem where the input is ambiguous: did the user actually mean to have a character entity reference <, or did he want HTML Purifier to escape it again? I prefer to tell users about the CDATA escaping mechanism, which works nicely and is standard XML (HTML Purifier transparently converts it into HTML).

Is this also a vulnerability?:

$string = str_replace(array("<pre>", "</pre>"), array("<pre><![CDATA[", " ] ]> </pre>"), $string);

(naturally take the space out of "] ]>" in the actual code, but I had to insert it to stop it from actually closing my code sample here)

This way your users don't need to worry about using the tags, it's done silently.

Re: How do I get HTML Purifier to ignore the content in a pre tag?
July 14, 2009 03:53PM

Not a vulnerability, but it hurts users who do the right thing.

fdfdfdfdfdfdfdfdfdfdfdffdfdfdfdfdfdfdfdfdfdfdfdfdfdfdfdf

Author:
Your Email:

Subject:

HTML input is enabled. Make sure you escape all HTML and angled brackets with &lt; and &gt;.

Auto-paragraphing is enabled. Double newlines will be converted to paragraphs; for single newlines, use the pre tag.

Allowed tags: a, abbr, acronym, b, blockquote, caption, cite, code, dd, del, dfn, div, dl, dt, em, i, ins, kbd, li, ol, p, pre, s, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, var.

For inputting literal code such as HTML and PHP for display, use CDATA tags to auto-escape your angled brackets, and pre to preserve newlines:

<pre><![CDATA[
Place code here
]]></pre>

Power users, you can hide this notice with:

.htmlpurifier-help {display:none;}

Message: