HTML Purifier 3.1.0 released

HTML Purifier 3.1 represents a major shift from a PHP 4 centric codebase to a PHP 5, whereas HTML Purifier 3.0 was merely done for E_STRICT compliance. As such, it poses some migration concerns that should be addressed, most prominently HTML Purifier's new usage of the autoload system.


Autoloading is singularly the largest architectural change in HTML Purifier, and under certain circumstances, can give you a hefty performance boost too (not using the autoloader, but hold onto that thought for a moment). Previously, HTML Purifier loaded everything it needed from HTMLPurifier.php. Things have changed a little. I've investigated this thoroughly, and the following cases will require some user intervention:

You're a PEAR user

Previously, I told you to use this code:

require_once 'HTMLPurifier.php';

This will no longer be sufficient, because it doesn't register HTML Purifier's autoloader. Replace the line with:

require_once '';

You included HTMLPurifier.php directly

Follow the same instructions as a PEAR user.

You are already using autoloading, and are on a version of PHP earlier than 5.1.2

In early versions of PHP 5, there was no way to register multiple autoload handlers (with spl_autoload_register). You will need to manually modify your autoloader to get HTML Purifier to play nice with it.

Suppose your autoload function looks like this:

function __autoload($class) {
  require str_replace('_', '/', $class) . '.php';
  return true;

A modified version with HTML Purifier would look like this:

function __autoload($class) {
  if (HTMLPurifier_Bootstrap::autoload($class)) return true;
  require str_replace('_', '/', $class) . '.php';
  return true;

Make sure you call HTMLPurifier_Bootstrap::autoload() first, because it will ignore class names that aren't prefixed with HTMLPurifier.

You are already using autoloading, and are on PHP 5.1.2+

Congratulations; you probably won't need to make any modifications. However, it's worth taking a look whether or not you are using __autoload or spl_autoload_register. If it's the former, you may want to consider adding this line of code to your application:


This is a good idea because spl_autoload_register overrides any __autoload function, so if a misbehaving library (not HTML Purifier, of course!) registers its own autoloader function, yours will mysteriously stop working. You are required to do this if your autoloader is defined after HTML Purifier's autoloader is called.

Some extra notes

With those modifications, your HTML Purifier installation should not be fatally error'ing out. If it is, please post in the Support forums and I'll try to help and figure it out.

If you've got things working, and would like to try some of the newest features out, check out the following files:

This is the performance-friendly file I was talking about earlier. If you use this, you don't need the autoloader at all—just swap 'auto' with 'includes'. The downside is that if you are using any non-standard classes, you'll need to include them manually.
On the prompting of Lukasz Pilorz, I wrote a little wrapper for HTML Purifier using the kses interface. It's pretty neat and works with kses's configuration parameters, so check it out if you've got some legacy code you want to migrate.
This is the not-so-performance-friendly counterpart of HTMLPurifier.includes.php. On the plus side, however, it doesn't need autoload, and it can be included from anywhere with impunity.


The interface for registering filters changed slightly. You may have noticed some E_USER_WARNINGs emitting from code that looks like:

$purifier = new HTMLPurifier();
require_once 'HTMLPurifier/Filter/YouTube.php';
$purifier->addFilter(new HTMLPurifier_Filter_YouTube());

We've replaced addFilter() with some new configuration directives. Combined with autoloading, the above code turns into:

$config = HTMLPurifier_Config::createDefault();
$config->set('Filter', 'YouTube', true);
$purifier = new HTMLPurifier($config);

If you're using a custom filter, you'll need some slightly different code:

$config = HTMLPurifier_Config::createDefault();
$config->set('Filter', 'Custom', array(
    new YourCustomFilter()
$purifier = new HTMLPurifier($config);

Everything else...

Configuration aliases

There may be a few miscellaneous warnings left. If your error-reporting level includes notices, you might see HTML Purifier complaining about the usage of deprecated aliases. Don't worry: I'm not going to remove those aliases, but from a performance standpoint it's a good idea to convert the old directive to the new directive.

tag.attr to tag@attr

If you were using %HTML.AllowedAttributes, it is recommended that you upgrade your syntax from tag.attr to tag@attr. While the two are functionally equivalent, and the dot-syntax will not be deprecated any time soon, this modification is made with an eye towards future compatibility with XML: XML permits tag names to have periods. %HTML.ForbiddenAttributes will only allow the at-sign-syntax, and will output an informative error message if you do otherwise.


From there, it gets highly internal. If you've been making custom modules for yourself, please note that the signature of HTMLPurifier_HTMLModule->addElement() has changed; there is no more $safe parameter. However, there was no $safe parameter to begin with in HTMLPurifier_HTMLDefinition->addElement(), so users of that method don't have to worry about this change. For the curious, this change is indicative of the shift from element-based safety to module-based safety. Once I implement more elements and attributes for trusted mode, there will be more documentation for this.


The static methods in HTMLPurifier_ConfigSchema were deprecated. They probably still work, although they're not being actively tested now. If you need to add custom configuration to HTML Purifier, retrieve a copy of the schema using HTMLPurifier_ConfigSchema::instance() and then operating on it using the add*() methods. Some of the method signatures have changed, most notably there's an extra $allowsNull parameter after $type in add(). Extensible configuration is somewhat an unknown, so if you have definitive use-cases you'd like to share with me and influence the architecture of this, please say so. Please do not add your own files to the schema/ directory unless you plan on submitting your changes for incorporation with the core. For information on how this subsystem works, check out the documentation on Config Schema.

Return by reference

A number of methods that returned explict references to objects now merely return objects. Due to PHP 5's new object system, objects are passed automatically by reference, making an ampersand unnecessary. If you have code that does this:

$def =& $config->getHTMLDefinition(); will throw an E_STRICT error. The fix is:

$def = $config->getHTMLDefinition();


HTMLPurifier_Printer_ConfigForm::getCSS() and HTMLPurifier_Printer_ConfigForm::getJavascript() should be called statically, not from an instance variable. Change:

$css = $form->getCSS();

$css = HTMLPurifier_Printer_ConfigForm::getCSS();

New features!

Thanks for putting up with all that backwards-compatibility documentation! Now we get to the fun stuff: new features. The new features are mostly all configuration directives:

HTML Purifier 3.1.0 also boasts a far more robust URI handling system. URIs such as首頁 are converted into (previously, they were incorrectly left in IRI form.)

As usual, see the NEWS for a full list of enhancements and bugfixes.