Welcome! » Log In » Create A New Profile

Should Purifier do a double run?

Posted by TRiG 
Should Purifier do a double run?
May 26, 2010 01:16PM

With HTML Purifier set to remove empty and to remove spans without attributes,

<pre><![CDATA[

<p><span style="font-family: "> <p align="left">Installation and Testing of the Electrical & Instrumentation</p> <p align="left">Works</p> <p align="left">• Installation of Primary & Secondary Containment</p> </span></p> <p> </p> <p> </p>

]]></pre>

Produces the following purified output:

<pre><![CDATA[

<p> </p><p align="left">Installation and Testing of the Electrical & Instrumentation</p> <p align="left">Works</p> <p align="left">• Installation of Primary & Secondary Containment</p>

]]></pre>

The purified output is valid, of course, but it still contains an empty element.

If you run that through again, it's further purified, to remove the empty paragraph:

<pre><![CDATA[

<p align="left">Installation and Testing of the Electrical & Instrumentation</p> <p align="left">Works</p> <p align="left">• Installation of Primary & Secondary Containment</p>

]]></pre>

Should Purifier run on a loop, repeatedly purifying until no changes are made to the HTML string?

TRiG.

Re: Should Purifier do a double run?
May 26, 2010 01:19PM

It seems a bit overkill, and while HTML Purifier tries to be as idempotent as possible, there's no guarantee that you wouldn't hit a cycle.

Re: Should Purifier do a double run?
May 26, 2010 01:32PM

Fair enough. That's a good warning. I won't risk it, so. I suppose it would be caught next time the page was opened and saved, anyway.

TRiG.

Re: Should Purifier do a double run?
May 26, 2010 01:39PM

Well, I might imagine you'd want to keep the original HTML around, in case HTML Purifier chomped some text unexpectedly.

Author:
Your Email:

Subject:

HTML input is enabled. Make sure you escape all HTML and angled brackets with < and >.

Auto-paragraphing is enabled. Double newlines will be converted to paragraphs; for single newlines, use the pre tag.

Allowed tags: a, abbr, acronym, b, blockquote, caption, cite, code, dd, del, dfn, div, dl, dt, em, i, ins, kbd, li, ol, p, pre, s, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, var.

For inputting literal code such as HTML and PHP for display, use CDATA tags to auto-escape your angled brackets, and pre to preserve newlines:

<pre><![CDATA[
Place code here
]]></pre>

Power users, you can hide this notice with:

.htmlpurifier-help {display:none;}

Message: