HTML Purifier

Standards-Compliant HTML Filtering

Summary

Safe

HTML Purifier defeats XSS with an audited whitelist

Clean

HTML Purifier ensures standards-compliant output

Open

HTML Purifier is open-source and highly customizable

HTML Purifier is a standards-compliant HTML filter library written in PHP. HTML Purifier will not only remove all malicious code (better known as XSS) with a thoroughly audited, secure yet permissive whitelist, it will also make sure your documents are standards compliant, something only achievable with a comprehensive knowledge of W3C's specifications. Tired of using BBCode due to the current landscape of deficient or insecure HTML filters? Have a WYSIWYG editor but never been able to use it? Looking for high-quality, standards-compliant, open-source components for that application you're building? HTML Purifier is for you!

I'd just like to say we use HTML Purifier in IRIS for filtering emails against XSS attacks and we've been more than impressed.
— Chris Corbyn, Senior IRIS Developer

Background

There are a number of open-source HTML filtering solutions out there on the web already. What sets HTML Purifier apart from them? Aren't all of these choices “secure”?

When it comes to HTML, attention to detail is key. Does it perform its filtering off a whitelist rather than an out-of-date blacklist? Does it filter every attribute in the document? Does it actually understand HTML?

Know thy enemy. Hackers have a huge arsenal of XSS vectors hidden within the depths of the HTML specification. HTML Purifier is effective because it decomposes the whole document into tokens and removing non-whitelisted elements, checking the well-formedness and nesting of tags, and validating all attributes according to their RFCs. HTML Purifier's comprehensive algorithms are complemented by a breadth of knowledge, ensuring that richly formatted documents pass through unstripped.

To my knowledge, there is nothing else in the wild that offers protection from XSS, standards-compliance, and corrective processing of poorly formed HTML. But don't take my word for it: do your research and try out the demo.

To find out more, you can read the Comparison for a analysis of HTML Purifier and the other major filters.

[Y]ou save my day by allowing me not to write another damned HTML parser.
— Joseph Halter, Technical Director at Akira Web

Recent News

HTML Purifier 4.0 released

Posted 7:46 PM EDT on Wednesday, July 8, 2009

HTML Purifier 4.0 is a major feature release focused on configuration It deprecates the $config->set('Ns', 'Directive', $value) syntax for $config->set('Ns.Directive', $value); both syntaxes work but the former will throw errors. There are also some new features: robust support for name/id, configuration inheritance, remove nbsp in the RemoveEmpty autoformatter, userland configuration directives and configuration serialization.

You can find full information on how to perform the migration at dev-config-bcbreaks.txt, although the transforms are very simple and the error messages should tell you what you need to do.

Having not performed an HTML Purifier release in so long, I have unfortunately forgotten the passphrase on my original private key. Furthermore, you may have noticed that commit messages are now showing up as ezyang@mit.edu instead of edwardzyang@thewritingpot.com. While not intentional, this is a good time to switch my GnuPG signing key. The new key you should verify against is 0x1E1C674B. Those of you who are paranoid should directly use the Git repository, which is tagged with the correct key (yes, muscle memory worked once, and then fled from me), although all future releases will be tagged with the new key. The key is also locally stored on htmlpurifier.org.

See NEWS for a complete changelog.

Update: I have remembered my password, and have resigned all of the releases with the old key. I still plan on going forward with the transition to the new GnuPG signing key (as it has a much larger key size and should be resilient in the face of nascent attacks against SHA-1). Check the download page for more information.

Read earlier news...

Plugins

HTML Purifier is a great library to integrate with existing CMSes and other applications or WYSIWYG editors. Currently, we have plugins for these applications:

HTML Purifier is also now in print! Martin Brampton's new book PHP 5 CMS Framework Development includes a discussion of using HTML Purifier in your content management system. Go check it out!

Notice: Any plugin provided by a third party has not been vetted by us: use them at your own risk. If you are having a problem with the plugin, please consult the plugin author before asking for help here (we'll be more than happy to help, but it might be a problem with the plugin rather than HTML Purifier.)

This plugin is on top of my favorite list[.] I am going to heavily depend on it since my clients insist on having WYSIWYG and I insist on having pages that validate and are semantically sound.
— David Molliere, MODx Marketing & Design Team

Plugins for other major applications gladly accepted!

Users

Here are some open-source applications that use HTML Purifier:

Lilina News Aggregator4.0.0
Yii4.0.0
PDF Newspaper3.3.0
TikiWiki3.3.0
NoseRub3.3.0
ImpressCMS3.1.1
Jibberbook3.1.1
Mia3.1.1
Midgardvia PEAR
BitWeavervia PEAR, see install_checks.php
Project Babelvia PEAR and Midgard
PHP Atompub Servervia download

If I've forgotten anyone, drop me a line with a link to both your application and the use of HTML Purifier in your code repository, and I'll add your application to this list.

Hall of Shame

The following projects package HTML Purifier with their software, but are not up-to-date. They are putting their userbase at risk of security attacks by not keeping HTML Purifier updated. If you're a user or developer for these projects, please raise your voice and help to get them fixed!

WPIDS3.0.0
Lichen Webmail2.1.4, see ticket #79
XOOPS Cube BRASIL2.1.3
XDForum1.3.2

Spread the Word!

Help spread awareness about HTML Purifier by: