Welcome! » Log In » Create A New Profile

converting ErrorCollector output suitable for log file (stripping HTML)

Posted by sukibabee 
converting ErrorCollector output suitable for log file (stripping HTML)
December 21, 2008 04:30PM

HTMLPurifier allows you to grab error output (what it stripped and maybe why), and I wanted to log this.... but (AFAIK), I can only get HTML formatted error output. This is hard to read in a text log file. So I use this code to strip out the HTML suitable for a log file. Maybe this is useful to someone else:

$dirty_html = &#039;<img src="javascript:evil();" onload="evil();" />hello<img src="/s.gif">&#039;;
$config = HTMLPurifier_Config::createDefault();
$config->set(&#039;Core&#039;, &#039;CollectErrors&#039;, true);      // this is needed to collect errors
$purifier = new HTMLPurifier($config);
$clean_html = $purifier->purify($dirty_html);

$e = $purifier->context->get(&#039;ErrorCollector&#039;);   // grab errors
if ($e->getRaw())                                 // errors were present
{
  $str = $e->getHTMLFormatted($config);           // get errors in html format (not so good for log file)

  // --------- interesting code starts here ------------------------------------
  $str = str_replace(&#039;<li>&#039;, "\n", $str);         // replace <li>&#039;s with newlines
  $str = preg_replace(&#039;/\<.*\>/Us&#039;, &#039;&#039;, $str);    // remove all other html tags  U=ungreedy, s=(. equals newline too)
  $str = trim(htmlspecialchars_decode($str));     // replace %gt; with &#039;>&#039; etc - and trim spaces and preceeding newline

  // at this point $str is a text string suitable for logging.  You can display in an html page like this:
  echo "<pre>";
  echo htmlspecialchars($str);
  echo "</pre>";
}
Re: converting ErrorCollector output suitable for log file (stripping HTML)
December 21, 2008 04:46PM

You're looking for HTMLPurifier_ErrorCollector->getRaw(), i.e. the output of that function is an array of "raw" error messages, which you can concatenate together into a text log.

Re: converting ErrorCollector output suitable for log file (stripping HTML)
December 21, 2008 04:51PM

DOH! Totally missed that. Thank you. :)

(for others..) I notice the error array contains this 0-line, 1-severity, 2-msg, 3-array_of_childen (??).

It is missing the column# (which isn't that useful i guess). There is a lot of logic involved in generating the HTML formatted error string (that includes a bit more info). I think HTML stripping is the way to go if those extra bits are wanted - though more likely to break with future releases.

I do not want to display the error message get from $config->set('Core', 'CollectErrors', true); what should I do?

Re: Do not want to display error message get from CollectErrors
May 09, 2014 03:14PM

Hasib: If you don't want to display it... turn of the config option?

how can i use this HTMLPurifier_ErrorCollector->getRaw() ?

Re: converting ErrorCollector output suitable for log file (stripping HTML)
March 02, 2016 04:07AM

There are not very many good docs, but I recommend looking at http://repo.or.cz/htmlpurifier-web.git/blob/refs/heads/master:/demo.php for some ideas how to use; also just var_dump the variable and see what the format looks like!

Sorry, you do not have permission to post/reply in this forum.