|
converting ErrorCollector output suitable for log file (stripping HTML) December 21, 2008 04:30PM |
Registered: 4 years ago Posts: 2 |
HTMLPurifier allows you to grab error output (what it stripped and maybe why), and I wanted to log this.... but (AFAIK), I can only get HTML formatted error output. This is hard to read in a text log file. So I use this code to strip out the HTML suitable for a log file. Maybe this is useful to someone else:
$dirty_html = '<img src="javascript:evil();" onload="evil();" />hello<img src="/s.gif">';
$config = HTMLPurifier_Config::createDefault();
$config->set('Core', 'CollectErrors', true); // this is needed to collect errors
$purifier = new HTMLPurifier($config);
$clean_html = $purifier->purify($dirty_html);
$e = $purifier->context->get('ErrorCollector'); // grab errors
if ($e->getRaw()) // errors were present
{
$str = $e->getHTMLFormatted($config); // get errors in html format (not so good for log file)
// --------- interesting code starts here ------------------------------------
$str = str_replace('<li>', "\n", $str); // replace <li>'s with newlines
$str = preg_replace('/\<.*\>/Us', '', $str); // remove all other html tags U=ungreedy, s=(. equals newline too)
$str = trim(htmlspecialchars_decode($str)); // replace %gt; with '>' etc - and trim spaces and preceeding newline
// at this point $str is a text string suitable for logging. You can display in an html page like this:
echo "<pre>";
echo htmlspecialchars($str);
echo "</pre>";
}
|
Re: converting ErrorCollector output suitable for log file (stripping HTML) December 21, 2008 04:46PM |
Admin Registered: 6 years ago Posts: 2,636 |
|
Re: converting ErrorCollector output suitable for log file (stripping HTML) December 21, 2008 04:51PM |
Registered: 4 years ago Posts: 2 |
DOH! Totally missed that. Thank you. :)
(for others..) I notice the error array contains this 0-line, 1-severity, 2-msg, 3-array_of_childen (??).
It is missing the column# (which isn't that useful i guess). There is a lot of logic involved in generating the HTML formatted error string (that includes a bit more info). I think HTML stripping is the way to go if those extra bits are wanted - though more likely to break with future releases.