HTML Purifier's documentation is organized by topic. New users should read the INSTALL file that comes with your HTML Purifier download. Any questions about HTML Purifier can be asked at the support forums (no registration required!)
Table of Contents
For First-Time users
The basic code for getting HTML Purifier setup is very simple:
require_once '/path/to/HTMLPurifier.auto.php'; $config = HTMLPurifier_Config::createDefault(); $purifier = new HTMLPurifier($config); $clean_html = $purifier->purify($dirty_html);
Replace $dirty_html
with the HTML you want to purify
and use $clean_html
instead. While HTML Purifier
has a lot of configuration knobs, the default configuration of
HTML Purifier is quite safe and should work for many users.
It's highly recommended to take a look at the full install documentation for more information, as it will give advice on how to make sure HTML Purifier's output is matches your page's character encoding
For Advanced Users
- End-User
Documentation — In-depth documents on how to get
the most out of HTML Purifier. These are located in the
docs/
folder of your HTML Purifier installation. - Configuration documentation — These are various configuration directives that can be used to customize HTML Purifier's behavior.
- Doxygen-generated Documentation — No class left undocumented! Cross-referenced code! A must-read for any prospective HTML Purifier hacker.
- Print Definition — If you want to actually see what HTML Purifier's filtering rules are, look no further than to this page. You can even experiment with the configuration to see how things respond to different directives.
P.S. HTML Purifier's source code is well documented and very readable. If a question of your isn't answered by any of the above resources, go to the source! (Or ask in the forums.)
For Contributors
As is with any open source project, HTML Purifier always is looking for developers, writers and other folks willing to lend a hand. There are any number of things to work on! Please, take a moment to find out how you can help out this project.
Frequently Asked Questions
What does %HTML.Allowed mean?
The percent-dot format is a shorthand for HTML Purifier's configuration directives. It takes the form of %Namespace.Directive. For practical purposes, %HTML.Allowed translates into the following PHP code:
$config->set('HTML', 'Allowed', $value);
My attributes are mysteriously disappearing!
You've probably got magic quotes turned on, which is interfering with the single and double-quotes in HTML attributes. The usual way to fix this is with some runtime code or an ini tweak. Be sure not to introduce any SQL injection vulnerabilities!
How do I prevent foreign characters like ä and
from turning into ä?
This usually means that HTML Purifier is parsing your code as UTF-8, but your output encoding is something else. Read up this document on UTF-8 to learn how to fix this. (Short answer: use %Core.Encoding or switch to UTF-8.)
I can't use the target
or name
attribute in my a
tags!
The target
attribute has been deprecated for a long time, so
I highly recommend you look at other ways of, say, opening new windows
when you click a link (my favorites are “Don't do it!” or, if you
must, JavaScript) But if you must, the
%Attr.AllowedFrameTargets
directive is what you are looking for.
The name
attribute is dependent on IDs being enabled.
See this document on enabling user IDs for more information.
Is HTML Purifier slow?
HTML Purifier isn't exactly light or speedy; this is a tradeoff for the power and security the library affords. You can combat this by reading Speeding up HTML Purifier or using the standalone version.
Miscellaneous
- XSS Attacks Smoketest — Tests how well HTML Purifier fares against RSnake's famous cheatsheet of XSS attacks.
- Roadmap — Subject to lots of delays, but it's a glimpse of the future
- Artwork — Extra media goodies.