HTML Purifier 2.0 is the culmination of two major architectural changes. The first is Tidy, which enables HTML Purifier to both natively support deprecated elements and also convert them to standards-compliant alternatives. The second is the Advanced API, which enables users to create new elements and attributes with ease. Keeping in line with a commitment to high quality, there are also five esoteric bug-fixes and a plethora of subtle improvements that enhance the library.
Download
What is HTML Purifier?
HTML Purifier is a standards-compliant HTML filter written in PHP. Because it uses whitelists and a comprehensive knowledge of the HTML specification, it is bullet-proof against XSS, fixes malformed input rather than reject it, and is open and extensible. Don't take my word for it: try the demo or read how HTML Purifier compares to other libraries.
What is Tidy?
While Tidy may remind of HTMLTidy, our Tidy has nothing to do with Dave Raggett's library. Previously, HTML Purifier was really fussy about deprecated elements and always tried to convert them to standards-compliant alternatives. Now, you can pick: stay with the deprecated (but valid) elements or clean them up! Read more about it in the Tidy documentation.
What is the Advanced API?
The Advanced API is a powerful new interface users can use to customize HTML Purifier with their own custom attributes and elements. Read more about it in the customization documentation.
Backwards-incompatible changes
There are a few behavioral changes that may break code written for esoteric features of the previous versions:
-
Previous customizations to HTMLDefinition will now
throw fatal errors. This is easy to fix: set
$config->set('HTML', 'DefinitionID', 'your-name-here')
and then read the documentation on the advanced API to convert your code to the brand new features. (Trust me: it's a lot easier to write.) -
Configuration objects are finalized when used.
This means you cannot set another configuration value after
you have already used it to purify some text. Set the
autoFinalize
member variable to false in order to work around this, or try to re-structure your code so that it is not necessary. -
Interface for HTMLPurifier_Lexer::create() changed.
This factory method no longer accepts a prototype as a parameter:
instead, it requires a configuration object. To overload the Lexer
with your own custom one, set
$config->set('Core', 'LexerImpl', $lexer)
. Note however, that the lexer selection has gotten a bit smarter, so you may want to just let HTML Purifier do its thing. -
Caching for HTMLDefinition added, please ensure the cache
output directory is writeable. While this change won't
break anything per-say, you'll be missing out on a tremendous
speed increase if you don't ensure that
library/HTMLPurifier/DefinitionCache/Serializer
is writeable by PHP. In the event this is not possible, you can change the cache output directory using$config->set('Cache', 'SerializerPath', $path);
(please use absolute paths).
I take backwards-compatibility very seriously, so if you have any problems pop on over to the forums and I'll do whatever I can to help you.
What is new?
Tidy and the Advanced API are all fine and dandy, but they're aimed towards the advanced user. HTML Purifier 2.0.0 also has a number of extra features that target the common guy too! Here are the more notable ones:
-
New %HTML.Allowed configuration directive lets you set allowed attributes
and elements in one go! Use a TinyMCE style format: "
a[href|title],b,i
" - The configuration object gives friendlier error messages when things go wrong.
- HTML Purifier works in PHP 4.3.2. That's pretty ancient, but it's good to know that you can still use HTML Purifier on those crappy webhosts that refuse to upgrade.
- When running in Transitional mode (HTML 4.01 Transitional or XHTML 1.0 Transitional), HTML Purifier will be as lazy as possible when fixing things up: this means that deprecated elements will be preserved in these doctypes.
As usual, you can see a full list of changes, bugfixes and other miscellanea in News.
Spread the word!
Used HTML Purifier and liked it? Interested but will investigate later? Disbelieving at the prospect of bullet-proof XSS protection? Whatever your thoughts, help spread the word!