Welcome! » Log In » Create A New Profile

Upgrade to 4.0.0 issues

Posted by Jochem 
Upgrade to 4.0.0 issues
January 26, 2010 04:30AM

Hello,

My goal is to upgrade HTML purifier from 3.1.1 to the latest version 4.0.0.

The following requirements are set: * secure against XSS * dont load external content (with a boolean to override this option) * alle links must have target blank * extract css

old code:


require_once('HTMLPurifier.auto.php');
require_once('class.csstidy.php');

class HTMLPurifier_AttrTransform_ForceValue extends HTMLPurifier_AttrTransform
{
  var $name, $value;
  function HTMLPurifier_AttrTransform_ForceValue($name, $value) {
    $this->name  = $name;
    $this->value = $value;
  }
  function transform($attr, $config, $context) {
    $attr[$this->name] = $this->value;
    return $attr;
  }
}

$oConfig = HTMLPurifier_Config::createDefault();
$oConfig->set('Filter', 'ExtractStyleBlocks', true);

$oConfig->set('HTML', 'DefinitionID', '1');
$def =& $oConfig->getHTMLDefinition(true);
$def->addAttribute('a', 'target', 'Enum#_blank,_self,_target,_top');
$def->info['a']->attr_transform_post['target'] = new HTMLPurifier_AttrTransform_ForceValue('target', '_blank');

$purifier = new HTMLPurifier($oConfig);

$sContent = $purifier->purify( $sContent );
$aCssStyles = $purifier->context->get('StyleBlocks');

# fill the style tag in the head of the template
if (is_array($aCssStyles))
{
        foreach ($aCssStyles as $sCssStyle)
        {
          echo $sCssStyle . "\n";
        }
}

# fill the body tag in the body of the template
echo $sContent;

New code

I found similair questions on the forum regaring the rel attribute. So I tried fixing the target _blank issue myself. I followed the Customize tutorial http://htmlpurifier.org/phorum/posting.php?3 and the comments in thread http://htmlpurifier.org/phorum/read.php?3,3216


ini_set('display_errors', 'true');

error_reporting(E_ALL);
set_magic_quotes_runtime(0);
date_default_timezone_set('Europe/Amsterdam');

class HTMLPurifier_AttrTransform_Target extends HTMLPurifier_AttrTransform
{

  public function transform($attr, $config, $context) {
    // Abort early if we're using relaxed definition of name
    //        if ($config->get('HTML.Attr.Name.UseCDATA')) return $attr;
    //        if (!isset($attr['name'])) return $attr;
    //        $id = $this->confiscateAttr($attr, 'name');
    //        if ( isset($attr['id']))   return $attr;
    //        $attr['id'] = $id;
    return $attr['target'] = '_blank';
  }

}

$config = HTMLPurifier_Config::createDefault();
$config->set('Filter.ExtractStyleBlocks', true);
$config->set('HTML.DefinitionID', 'enduser-customize.html tutorial');
$config->set('HTML.DefinitionRev', 1);
$config->set('Cache.DefinitionImpl', null); // remove this later!
$config->attr_transform_pre['a'] = new HTMLPurifier_AttrTransform_Target();

$def = & $config->getHTMLDefinition(true);
$def->addAttribute('a', 'target', new HTMLPurifier_AttrDef_Enum(
array('_blank','_self','_target','_top')
));

$def->attr_transform_pre['a'] = new HTMLPurifier_AttrTransform_Target();

$html = '<html><body bgcolor="red"><a href="http://www.google.nl" style=" color="font-color: red">Google</a></body></html>';

$purifier = new HTMLPurifier($config);
$purifier->attr_transform_pre['a'] = new HTMLPurifier_AttrTransform_Target();
//echo '<pre>';print_r($o);echo '</pre>';
echo $purifier->purify($html);

$aCssStyles = $purifier->context->get('StyleBlocks');
if (is_array($aCssStyles))
{
  foreach ($aCssStyles as $sCssStyle)
  {
    echo $sCssStyle . "\n";
  }
}

Outputs:

<a href="http://www.google.nl">Google</a>

I does not seem to work. Can you guide me to the solution?

Re: Upgrade to 4.0.0 issues
January 26, 2010 02:51PM

In your sample code you are referring to class definitions that don't exist in the core, for example HTMLPurifier_AttrTransform_Target and HTMLPurifier_AttrTransdorm_ForceValue. Could you post the code for these? Also, I highly recommend turning on error reporting.

Re: Upgrade to 4.0.0 issues
January 27, 2010 02:34AM

Whoops... the missing classes are edited in the first post. Please don't focus to much on the old code ;) Error reporting was already on.

Re: Upgrade to 4.0.0 issues
January 28, 2010 11:49AM

There are multiple errors in your code. I'll just paste a working version.

<?php
ini_set('display_errors', 'true');

require 'library/HTMLPurifier.auto.php';

error_reporting(E_ALL);
set_magic_quotes_runtime(0);
date_default_timezone_set('Europe/Amsterdam');

class HTMLPurifier_AttrTransform_Target extends HTMLPurifier_AttrTransform
{

  public function transform($attr, $config, $context) {
    // Abort early if we're using relaxed definition of name
    //        if ($config->get('HTML.Attr.Name.UseCDATA')) return $attr;
    //        if (!isset($attr['name'])) return $attr;
    //        $id = $this->confiscateAttr($attr, 'name');
    //        if ( isset($attr['id']))   return $attr;
    //        $attr['id'] = $id;
    $attr['target'] = '_blank';
    return $attr;
  }

}

$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.DefinitionID', 'enduser-customize.html tutorial');
$config->set('HTML.DefinitionRev', 1);
$config->set('Cache.DefinitionImpl', null); // remove this later!
$config->set('Attr.AllowedFrameTargets', array('_blank'));

$def = & $config->getHTMLDefinition(true);
$a = $def->addBlankElement('a');
$a->attr_transform_pre[] = new HTMLPurifier_AttrTransform_Target();

$html = '<html><body bgcolor="red"><a href="http://www.google.nl" style=" color="font-color: red">Google</a></body></html>';

$purifier = new HTMLPurifier($config);
//echo '<pre>';print_r($o);echo '</pre>';
echo $purifier->purify($html);
Author:
Your Email:

Subject:

HTML input is enabled. Make sure you escape all HTML and angled brackets with &lt; and &gt;.

Auto-paragraphing is enabled. Double newlines will be converted to paragraphs; for single newlines, use the pre tag.

Allowed tags: a, abbr, acronym, b, blockquote, caption, cite, code, dd, del, dfn, div, dl, dt, em, i, ins, kbd, li, ol, p, pre, s, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, var.

For inputting literal code such as HTML and PHP for display, use CDATA tags to auto-escape your angled brackets, and pre to preserve newlines:

<pre><![CDATA[
Place code here
]]></pre>

Power users, you can hide this notice with:

.htmlpurifier-help {display:none;}

Message: