HTML Purifier

Standards-Compliant HTML Filtering

Summary

Safe

HTML Purifier defeats XSS with an audited whitelist

Clean

HTML Purifier ensures standards-compliant output

Open

HTML Purifier is open-source and highly customizable

Most recent release is a security update. Please upgrade to HTML Purifier 3.1.1 or 2.1.5 as soon as possible.

HTML Purifier is a standards-compliant HTML filter library written in PHP. HTML Purifier will not only remove all malicious code (better known as XSS) with a thoroughly audited, secure yet permissive whitelist, it will also make sure your documents are standards compliant, something only achievable with a comprehensive knowledge of W3C's specifications. Tired of using BBCode due to the current landscape of deficient or insecure HTML filters? Have a WYSIWYG editor but never been able to use it? Looking for high-quality, standards-compliant, open-source components for that application you're building? HTML Purifier is for you!

I'd just like to say we use HTML Purifier in IRIS for filtering emails against XSS attacks and we've been more than impressed.
— Chris Corbyn, Senior IRIS Developer

Background

There are a number of open-source HTML filtering solutions out there on the web already (i.e. PEAR's HTML_Safe, kses and SafeHtmlChecker.class.php). What sets HTML Purifier apart from them? Aren't all of these choices secure?

When it comes to HTML, attention to detail is key. Does the library demonstrate an in-depth knowledge of the DTD that defines HTML? Does it perform its filtering off a robust whitelist rather than a usually out-dated blacklist? Does it go through the care to check every single attribute in the document for validity? Does it actually understand tag markup, or pay lip-service with a series of deficient regexes and str_replace's?

Somewhere along the way, all of HTML Purifier's predecessors fall flat. HTML_Safe dooms itself to attacks of the future by using a blacklist. Configurable filters like kses and PHP Input Filter still cannot validate the contents inside attributes. With all these gaps in coverage, none of the usual libraries come close to achieving standards-compliance. There is a user-unfriendly, draconic XML-based filter called Safe HTML Checker, but even it forgets that <a> tags cannot be nested within each other!

Know thy enemy. Wily hackers have a huge arsenal of XSS hidden within the depths of the HTML specification. HTML Purifier takes its effectiveness from the fact that it will decompose the whole document into tokens, and rigorously process the tokens by removing non-whitelisted elements, transforming bad practice tags like font into span, properly checking the nesting of tags and their children and validating all attributes according to their RFCs. HTML Purifier's comprehensive algorithms are complemented by a breadth of knowledge, ensuring that richly formatted documents pass through unstripped.

To my knowledge, there is nothing else in the wild that offers protection from XSS, standards-compliance, and the corrective processing of poorly formed HTML simultaneously. Don't take my word for it though: do your research. Investigate the other libraries, and decide for yourself who you would prefer to be the gatekeeper to your system.

To find out more, you can read the Comparison for a play-by-play analysis of the major filter libraries currently out there.

[Y]ou save my day by allowing me not to write another damned HTML parser.
— Joseph Halter, Technical Director at Akira Web

Plugins

HTML Purifier is a great library to integrate with existing CMSes and other applications or WYSIWYG editors. Currently, we have plugins for these applications:

Notice: Any plugin provided by a third party has not been vetted by us: use them at your own risk. If you are having a problem with the plugin, please consult the plugin author before asking for help here (we'll be more than happy to help, but it might be a problem with the plugin rather than HTML Purifier.)

This plugin is on top of my favorite list[.] I am going to heavily depend on it since my clients insist on having WYSIWYG and I insist on having pages that validate and are semantically sound.
— David Molliere, MODx Marketing & Design Team

Plugins for other major applications gladly accepted!

Users

Here are some open-source applications that use HTML Purifier:

Aliro3.1.0
Jibberbook3.1.0
Mia3.1.0
Kohana3.1.0
Midgardvia PEAR
BitWeavervia PEAR, see install_checks.php
Project Babelvia PEAR and Midgard
PHP Atompub Servervia download

If I've forgotten anyone, drop me a line with a link to both your application and the use of HTML Purifier in your code repository, and I'll add your application to this list.

Hall of Limbo: PHP4

The following applications are using HTML Purifier 2.1, for PHP4 compatibility. While this is fine, I would much rather they go PHP5!

There are currently no applications using an up-to-date version of HTML Purifier 2.1.

Hall of the Past

The following projects package HTML Purifier with their software, but are not up-to-date. They are putting their userbase at risk of security attacks by not keeping HTML Purifier updated. If you're a user or developer for these projects, please raise your voice and help to get them fixed!

WPIDS3.0.0
NoseRub3.0.0
Lilina News Aggregator2.1.3
TikiWiki2.1.3
XOOPS Cube BRASIL2.1.3
Lichen Webmail2.0.1, see ticket #79
PHProjekt1.6.0
XDForum1.3.2

Spread the Word!

Help spread awareness about HTML Purifier by:

  • Bookmarking this website on your del.icio.us account, and/or
  • Including this little label on your website: Powered by HTML Purifier, with this code:
    <a href="http://htmlpurifier.org/"><img
    src="http://htmlpurifier.org/live/art/powered.png"
    alt="Powered by HTML Purifier" border="0" /></a>