Welcome! » Log In » Create A New Profile

converting ErrorCollector output suitable for log file (stripping HTML)

Posted by sukibabee 
converting ErrorCollector output suitable for log file (stripping HTML)
December 21, 2008 04:30PM

HTMLPurifier allows you to grab error output (what it stripped and maybe why), and I wanted to log this.... but (AFAIK), I can only get HTML formatted error output. This is hard to read in a text log file. So I use this code to strip out the HTML suitable for a log file. Maybe this is useful to someone else:

$dirty_html = '<img src="javascript:evil();" onload="evil();" />hello<img src="/s.gif">';
$config = HTMLPurifier_Config::createDefault();
$config->set('Core', 'CollectErrors', true);      // this is needed to collect errors
$purifier = new HTMLPurifier($config);
$clean_html = $purifier->purify($dirty_html);

$e = $purifier->context->get('ErrorCollector');   // grab errors
if ($e->getRaw())                                 // errors were present
{
  $str = $e->getHTMLFormatted($config);           // get errors in html format (not so good for log file)

  // --------- interesting code starts here ------------------------------------
  $str = str_replace('<li>', "\n", $str);         // replace <li>'s with newlines
  $str = preg_replace('/\<.*\>/Us', '', $str);    // remove all other html tags  U=ungreedy, s=(. equals newline too)
  $str = trim(htmlspecialchars_decode($str));     // replace %gt; with '>' etc - and trim spaces and preceeding newline

  // at this point $str is a text string suitable for logging.  You can display in an html page like this:
  echo "<pre>";
  echo htmlspecialchars($str);
  echo "</pre>";
}
Re: converting ErrorCollector output suitable for log file (stripping HTML)
December 21, 2008 04:46PM

You're looking for HTMLPurifier_ErrorCollector->getRaw(), i.e. the output of that function is an array of "raw" error messages, which you can concatenate together into a text log.

Re: converting ErrorCollector output suitable for log file (stripping HTML)
December 21, 2008 04:51PM

DOH! Totally missed that. Thank you. :)

(for others..) I notice the error array contains this 0-line, 1-severity, 2-msg, 3-array_of_childen (??).

It is missing the column# (which isn't that useful i guess). There is a lot of logic involved in generating the HTML formatted error string (that includes a bit more info). I think HTML stripping is the way to go if those extra bits are wanted - though more likely to break with future releases.

Author:
Your Email:

Subject:

HTML input is enabled. Make sure you escape all HTML and angled brackets with &lt; and &gt;.

Auto-paragraphing is enabled. Double newlines will be converted to paragraphs; for single newlines, use the pre tag.

Allowed tags: a, abbr, acronym, b, blockquote, caption, cite, code, dd, del, dfn, div, dl, dt, em, i, ins, kbd, li, ol, p, pre, s, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, var.

For inputting literal code such as HTML and PHP for display, use CDATA tags to auto-escape your angled brackets, and pre to preserve newlines:

<pre><![CDATA[
Place code here
]]></pre>

Power users, you can hide this notice with:

.htmlpurifier-help {display:none;}

Message: