Contribute

The very first question to ask yourself before reading this page is this:

Why contribute?

As open-source software, you are not legally obligated to give anything back to the community. In such a sense, HTML Purifier is our gift to you, and you very well can run away and never be heard from again.

We hope, however, that this lack of a legal obligation doesn't prevent you from contributing back to our project. We poured many hours into this project, and doubtless, this project has saved many hours on your behalf. If HTML Purifier saved you 200 hours of work (the actual figure might be more, might be less), even if you contribute ten hours back to the project, you still come out ahead 190 hours.

Additionally, your use of this library also requires substantial investment on your part as well. You were required to learn the APIs, read the documentation, tweak things so that they worked with your application, et cetera. Contributing back means making good use of this investment: it means not only will your expertise and knowledge be fed back into HTML Purifier, but you might learn a thing or to from the internals that you didn't know before.

If I've convinced you, read on! It's quite easy to get started...

What can you do?

Contributions can come in many forms. Documentation, code, even evangelism, can all help a project. One of the things we've noticed, however, is that many contributions come from people helping themselves. They have an itch, a special requirement, and they help the project out in that area.

What might that itch be? Over the years, we've accumulated many feature requests in our TODO file. There are also tasty tidbits in the proposal section of our documentation. You might have an idea for a new AutoFormatter, or maybe would like to implement an HTMLModule for a set of elements that HTML Purifier doesn't support yet. Maybe you want a demo page built-in with the library so that you can easily test things out without using HTML Purifier's demo page. Code something that interests you.

Coding standards

As a general rule of thumb, make sure your code looks like the code around it. Probably the biggest thing is to remember four spaces, no tabs (if you perpetually forget, get your text-editor to make whitespace visible). There are a number of other formatting subtleties, but suffice to say consistency is the order of the day in this project. You're not going to read YACS anyway.

The code you write must be PHP 5.0.5 compatible, so avoid later features like magic methods. The code you write also must have unit tests, which reside in the tests/ directory. The workflow for your feature should be along the lines of:

  1. Write unit tests
  2. Hack hack hack
  3. Run php tests/index.php
  4. If failures, go back to 1 or 2
  5. Commit and submit patch

HTML Purifier prides itself in having an evergreen test suite, so if your change breaks other tests, it probably won't be accepted.

Getting setup

You already know how to use HTML Purifier. But do you know how to develop it?

Git

HTML Purifier's repository is hosted via Git. If you've used Git before, you can skip this section: you already know what the workflow is for working on Git, so just clone from git://repo.or.cz/htmlpurifier.git and get going. Otherwise, read-on.

In order to hack on HTML Purifier's source tree, you will first need to make sure Git is installed on your system. Type the following command in your prompt:

git --version

And you should get something along the lines of “git version 1.5.6”. Otherwise:

You use Linux:
Grab Git from your friendly neighborhood package manager. Or compile from source with package provided at git.or.cz. Either should be relatively simple.
You use Windows:
Download and install msysgit. Then, for all of the following commands we discuss, enter them in the console provided by Git Bash. If you have Cygwin, you can also use setup.exe to install Git.
You use a Mac:
There are binaries available from various sources; I haven't tried them so your mileage may vary. Since Mac is a BSD-like system, you can also compile from source.

Run the earlier command again to make sure the installation went smoothly. Now run this command:

git clone git://repo.or.cz/htmlpurifier.git

This will copy the HTML Purifier codebase into the htmlpurifier folder.

You will want to configure the Git installation with your name and email address. You can do this with these two commands.

git config --global user.name "Bob Doe"
git config --global user.email bob@example.com

Let us fast forward for a moment and imagine that we already made our changes and would now like to send the changes to HTML Purifier for review. You will to execute these commands:

git status

This command will give you a quick rundown about all the files Git knows about. If you have any “Untracked files”, you will need to add them with:

git add $filename

(You can also add “Changed but not updated” files, but because we will be using the -a option this is strictly unnecessary.)

Now, you will want to commit your changes. Users of centralized version control systems, beware: this does not push it to a remote repository, or anything like that. It simply records the change in your local repository. Doing so is as simple as:

git commit -as

The “a” flag tells Git to commit all modified files, even if you didn't git add them. The “s” flag tells Git to sign off your commit message with your name and email.

You will then have a screen brought up to enter a commit message. If this screen is vim (you can tell if your command line window transmuted into something you've never seen before), type i (--INSERT-- mode), write your commit message, type ESC, and then type :wq ENTER (write and quit).

A quick note about commit messages: there is a very specific format for them. They should look something like this:

Concise one-line statement describing change

Full explanation for the change. If you fixed a bug, make
sure you describe what was wrong, how you fixed it, and
what the behavior is now. If it was a feature, describe
why the feature is useful, how you use it, and any tricky
implementation details.

In short, the body of the commit message (which can span multiple
paragraphs) should, along with the code diff, be self
explanatory and not require any email introduction. At the
same time, your commit message will be immortalized and
should be in third-person and formal.

Signed-off-by: Edward Z. Yang <edwardzyang@thewritingpot.com>

Finally, after the commit has been recorded, you will want to make a patch to distribute to other people to review and test. Doing so is as simple as:

git format-patch -1

You can substitute -1 for -#, where # is the number of commits you would like to write patches for. You can also specify a commit hash ID.

A file named roughly 0001-Short-description.patch will be created, with the complete contents of your change.

In summary:

git clone git://repo.or.cz/htmlpurifier.git
git config --global user.name "Bob Doe"
git config --global user.email bob@example.com
cd htmlpurifier
# hack hack hack
git status
git add newfile1.txt subdir/newfile2.txt
git commit -as
git format-patch -1
# send patch off

Two quick notes before we go on to some HTML Purifier specific instructions:

  1. If you are posting the patch on the forum, be sure to copy-paste it in-between <pre><![CDATA[ and ]]></pre> If you are emailing the patch, we prefer that you send it inline in a text email (be sure to configure your mail client not to wrap lines, check out SubmittingPatches guidelines from the Git project for more details.)

  2. In all probability, there have been changes to the HTML Purifier codebase since you made your patch. As part of your duties as a patch-maker, you should ensure that your patch remains off of the HEAD of our master branch. You can do so with the command:

    git pull --rebase

    You may also find it useful to perform your development in a topic branch. You can do this using:

    git checkout -b branchname

    The benefits of a setup like this is you can now do a regular git pull on the master branch, and then use git rebase master on your own branch to keep it up to date. This can be useful if your patch produces a conflict. (One quick note; you switch between branches using git checkout branchname. The -b flag creates a new branch.)

    The default behavior of git pull in such a case is to merge your branch. If you were a release maintainer, this is what you would want to do, since your history was public and rewriting history could be disruptive. With private, local changes, however, performing the merge makes the history needlessly complicated.

SimpleTest

As mentioned before, one of the keys to successfully developing a new feature on HTML Purifier is a comprehensive set of unit tests. However, unit tests serve you no good if you can't run them.

The first step in getting unit tests running on HTML Purifier is downloading SimpleTest, our test suite. However, the public 1.0.1 release won't work with HTML Puriifer, as it is still PHP4 compatible and will give off spurious errors. You need to use the trunk version of SimpleTest. This version can be checked out using Subversion with this command:

svn co https://simpletest.svn.sourceforge.net/svnroot/simpletest/simpletest/trunk simpletest

The next step is to tell HTML Purifier about the SimpleTest installation. You can do this by copying the test-settings.sample.php file to test-settings.php and configuring it according to the instructions inside. The only variable you must edit is $simpletest_location.

At the moment, it is somewhat difficult to get the optional parameters setup properly. If you feel adventurous, try the instructions; they should work, but might be a little complicated or sparser than usual.

Now, check if everything is running by typing php tests/index.php --flush from the root of your HTML Purifier working copy. You should get a full complement of passing tests. Congratulations!

Workflow

After identifying what changes you would like to make to HTML Purifier, you will need to code appropriate unit tests for it. (If you are of the code first, test later mentality, that is fine too; just make sure the tests are 1. written and 2. comprehensive.) If you modify the file library/HTMLPurifier/ConfigSchema.php, chances are the corresponding tests are in tests/HTMLPurifier/ConfigSchemaTest.php (i.e. substitute library with tests and append a Test to the filename.)

We prefer, first-and-foremost, unit tests, that is, the test should not have any dependencies on any other objects, and if it does, those dependencies should be filled in using SimpleTest's excellent mock object support. We also believe strongly in integration tests, which take in the form of htmlt files, and test HTML Purifier as a whole with your modifications. An htmlt file looks like this:

--INI--
%HTML.Allowed = "b,i,u,p"
--HTML--
<b>Foo<a id="asdf">bar</a></b>
--EXPECT--
<b>Foobar</b>

The --INI-- section indicates the configuration directives that should be used with this test (if you added a new feature, you will most probably be using this section to activate it). The --HTML-- section indicates the input, and the --EXPECT-- indicates the expected output. Be sure to include a trailing newline. You can place these files in the tests/HTMLPurifier/HTMLT directory; give them a descriptive filename.

It is my hope that you find the HTML Purifier core code a joy (or at least, not painful) to work with; every class and method has a docblock that doesn't reiterate what you can find inside its body, but also how the component fits into HTML Purifier as a whole. If you find any section of code that is missing or has poor documentation, please notify us and we will correct it immediately. (Remember, git pull --rebase to update your branch!)

There are, however, some architectural features that are not immediately evident from mere source-code browsing. In this case, you are encouraged to check out the documentation in the docs/ folder (web accessible at the same location.) “Flushing the Purifier” and “Config Schema” in the Development center are of particular notability: in all likelihood you will need this knowledge in order to get HTML Purifier working the way you want it to.

Debugging

Your debugging skills are as good as mine, but there are few things that are helpful to keep in mind: