<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">
    <channel>
        <title>converting ErrorCollector output suitable for log file (stripping HTML)</title>
        <description>HTMLPurifier allows you to grab error output (what it stripped and maybe why), and I wanted to log this.... but (AFAIK), I can only get HTML formatted error output.  This is hard to read in a text log file.  So I use this code to strip out the HTML suitable for a log file.  Maybe this is useful to someone else:


$dirty_html = '&amp;lt;img src=&quot;javascript:evil();&quot; onload=&quot;evil();&quot; /&amp;gt;hello&amp;lt;img src=&quot;/s.gif&quot;&amp;gt;';
$config = HTMLPurifier_Config::createDefault();
$config-&amp;gt;set('Core', 'CollectErrors', true);      // this is needed to collect errors
$purifier = new HTMLPurifier($config);
$clean_html = $purifier-&amp;gt;purify($dirty_html);

$e = $purifier-&amp;gt;context-&amp;gt;get('ErrorCollector');   // grab errors
if ($e-&amp;gt;getRaw())                                 // errors were present
{
  $str = $e-&amp;gt;getHTMLFormatted($config);           // get errors in html format (not so good for log file)

  // --------- interesting code starts here ------------------------------------
  $str = str_replace('&amp;lt;li&amp;gt;', &quot;\n&quot;, $str);         // replace &amp;lt;li&amp;gt;'s with newlines
  $str = preg_replace('/\&amp;lt;.*\&amp;gt;/Us', '', $str);    // remove all other html tags  U=ungreedy, s=(. equals newline too)
  $str = trim(htmlspecialchars_decode($str));     // replace %gt; with '&amp;gt;' etc - and trim spaces and preceeding newline

  // at this point $str is a text string suitable for logging.  You can display in an html page like this:
  echo &quot;&amp;lt;pre&amp;gt;&quot;;
  echo htmlspecialchars($str);
  echo &quot;&amp;lt;/pre&amp;gt;&quot;;
}
</description>
        <link>http://htmlpurifier.org/phorum/read.php?2,2797,2797#msg-2797</link>
        <lastBuildDate>Fri, 24 May 2013 02:46:16 -0400</lastBuildDate>
        <generator>Phorum 5.2.18</generator>
        <item>
            <guid>http://htmlpurifier.org/phorum/read.php?2,2797,2800#msg-2800</guid>
            <title>Re: converting ErrorCollector output suitable for log file (stripping HTML)</title>
            <link>http://htmlpurifier.org/phorum/read.php?2,2797,2800#msg-2800</link>
            <description><![CDATA[<p>DOH!  Totally missed that.  Thank you.  :)</p>

<p>(for others..) I notice the error array contains this 0-line, 1-severity, 2-msg, 3-array_of_childen (??).</p>

<p>It is missing the column# (which isn't that useful i guess).  There is a lot of logic involved in generating the HTML formatted error string (that includes a bit more info).  I think HTML stripping is the way to go if those extra bits are wanted - though more likely to break with future releases.</p>]]></description>
            <dc:creator>sukibabee</dc:creator>
            <category>General</category>
            <pubDate>Sun, 21 Dec 2008 16:51:25 -0500</pubDate>
        </item>
        <item>
            <guid>http://htmlpurifier.org/phorum/read.php?2,2797,2799#msg-2799</guid>
            <title>Re: converting ErrorCollector output suitable for log file (stripping HTML)</title>
            <link>http://htmlpurifier.org/phorum/read.php?2,2797,2799#msg-2799</link>
            <description><![CDATA[<p>You're looking for HTMLPurifier_ErrorCollector-&gt;getRaw(), i.e. the output of that function is an array of "raw" error messages, which you can concatenate together into a text log.</p>]]></description>
            <dc:creator>Ambush Commander</dc:creator>
            <category>General</category>
            <pubDate>Sun, 21 Dec 2008 16:46:27 -0500</pubDate>
        </item>
        <item>
            <guid>http://htmlpurifier.org/phorum/read.php?2,2797,2797#msg-2797</guid>
            <title>converting ErrorCollector output suitable for log file (stripping HTML)</title>
            <link>http://htmlpurifier.org/phorum/read.php?2,2797,2797#msg-2797</link>
            <description><![CDATA[<p>HTMLPurifier allows you to grab error output (what it stripped and maybe why), and I wanted to log this.... but (AFAIK), I can only get HTML formatted error output.  This is hard to read in a text log file.  So I use this code to strip out the HTML suitable for a log file.  Maybe this is useful to someone else:</p>

<pre>
$dirty_html = '&lt;img src="javascript:evil();" onload="evil();" /&gt;hello&lt;img src="/s.gif"&gt;';
$config = HTMLPurifier_Config::createDefault();
$config-&gt;set('Core', 'CollectErrors', true);      // this is needed to collect errors
$purifier = new HTMLPurifier($config);
$clean_html = $purifier-&gt;purify($dirty_html);

$e = $purifier-&gt;context-&gt;get('ErrorCollector');   // grab errors
if ($e-&gt;getRaw())                                 // errors were present
{
  $str = $e-&gt;getHTMLFormatted($config);           // get errors in html format (not so good for log file)

  // --------- interesting code starts here ------------------------------------
  $str = str_replace('&lt;li&gt;', "\n", $str);         // replace &lt;li&gt;'s with newlines
  $str = preg_replace('/\&lt;.*\&gt;/Us', '', $str);    // remove all other html tags  U=ungreedy, s=(. equals newline too)
  $str = trim(htmlspecialchars_decode($str));     // replace %gt; with '&gt;' etc - and trim spaces and preceeding newline

  // at this point $str is a text string suitable for logging.  You can display in an html page like this:
  echo "&lt;pre&gt;";
  echo htmlspecialchars($str);
  echo "&lt;/pre&gt;";
}
</pre>]]></description>
            <dc:creator>sukibabee</dc:creator>
            <category>General</category>
            <pubDate>Sun, 21 Dec 2008 16:30:39 -0500</pubDate>
        </item>
    </channel>
</rss>
