Welcome! » Log In » Create A New Profile


Posted by Ambush Commander 
September 02, 2008 02:09PM

As HTML5 continues its march to completion and tutorials for the language start popping up on the internet, we may want to consider its role in the HTML Purifier scheme of things. There are several ways we can treat it:

  • HTML5 as a full-fledged document language. This means we treat it as if people were actually serving their pages as HTML5. This is not immediately useful, but will be once browsers start adding support for HTML5.
  • HTML5 as a meta-language that can be converted to HTML4. The premise behind this is we can turn HTML5 into HTML4 + CSS which will look the same as it would look in a browser that supports HTML5. So a user can write <l>Line</l> and get back the valid HTML 4.01 code <div class="html5-l">Line</div>; the former is much shorter.

The downside of doing any of these early-bird implementations is that we will have to be very careful to keep them in sync with the specification.

Are there any comments?

February 14, 2010 04:21PM

I am currently developing an HTML 5 CMS.

I am currently using HTML Purifier for the blog part of the CMS and that does not need any special HTML stuff (I can even allow audio/video tags by running a script on the output of purifier to change that part of the DOM).

However - for non blog content, there are basically three models -

  • site owner provided xml file consisting of content div
  • site owner provided php file that generates content div
  • rich text editor for generating content div

Since the blog uses XML internally, I'm currently using tidy to clean the content and ensure it is proper XML before sucking it into the DOM. While I can teach tidy about new tags without patching source, I can not teach it about new attributes to old tags so I can not have it enforce attribute sanity, not can I teach it the scope of where tags are allowed, etc.

I'm thinking that HTML Purifier might be a better tool for that than tidy, since teaching tidy to do it would require a modified binary library and I really would rather not go there.

For now, tidy is all I will use but I was thinking about playing with html purifier and seeing what I can do to add support for html 5 to it. The CMS already forbids inline scripting in the content div, though it does allow embedding via object and embed - but I think HTML Purifier provides for that, but with the policy I'm taking with respect to inline scripting (I rip any of it out with an output filter) being so close to what HTML Purifier already does, Purifier seems like a good match as a tool for getting improper content at least closer to standards compliant.

Who (if anyone) is currently working on html 5 support in tidy? While I do a fair amount of php programming on my own, I have *never* worked on a project with other developers. Is there a good primer on the programming style etc. (IE tabs vs spaces, bracket locations, etc.) on purifier coding standards that I should read?

Purifier is an awesome tool, if I am able to help make it more applicable to my project, I would love to.

With respect to the topic question, option #1 is better. HTML 5 has many advantages, and some parts of it (ie the details element) really can only be emulated in html 4 via JavaScript, and in the case of media, can only be implemented either by JavaScript and/or an object/embed fallback. So option #1 seems like the best way to do it.

Currently several browsers at least partially support html 5 - and the one common browser that doesn't, IE, does if the user installs chrome-frame and there are simple javascripts that can go in document head to teach IE w/o chrome frame to work with CSS and HTML 5 and emulate some HTML 5 interactive features, so serving HTML 5 now isn't that unreasonable.

February 14, 2010 10:25PM

Cool! (I was actually a little concerned at the time when there was absolutely no response to this topic). I certainly encourage you to give HTML Purifier as it is currently a try; you might be quite pleased with it.

I'd love to have other developers on board the project; these days, I have a number of other pet projects, so HTML5 support has basically been put on the backburner until I decide to put in the serious cycles to make it happen. Programming standards are basically, "make the code look like the surrounding code."

Sure. And option #1 is also easier on my end. However, the really interesting HTML5 features really have unknown security properties.

February 18, 2010 02:06PM

I'm glad to see there's some interest in this topic. Until yesterday we were using HTML Purifier on iPhoneLife.com to protect our bloggers from themselves, but yesterday I had to disable it because people want their YouTube videos, and for some reason the iPhone browser doesn't respect the Purified version of the YouTube embed code. I was thinking the long-term solution to this problem will be to adopt HTML 5 with its better media handling, but until HTML Purifier supports HTML 5 it's a moot point for us. Thanks in advance to anyone who works on it!

February 18, 2010 02:29PM

We might want to look at the iPhone technique for allowing YouTube videos as part of our standard cadre.

February 18, 2010 04:25PM

Thanks, Admin, but that nice clean code is not what I get when I turn HTML Purifier's YouTube feature on. Instead of removing the embed tag, it puts <!--if IE--> comments around the embed tag and leaves it in place, which satisfies IE and the HTML validator but does not work on the iPhone. How do I get the nice clean code that you say is "part of the standard cadre"?

February 18, 2010 04:32PM

As in, we should implement the code as described there, instead of our current Internet Explorer hack.

Benjamin "balupton" Lupton
June 12, 2010 11:57PM

Yeah I've just been developing my CMS, Website, and 3 client sites with HTML5 and HTML Purifier, and just discovered HTML Purifier strips out the HTML5 tags such as header and section.

I'm using version 4.1.1, and already using the PH5P lexer. How can I add support for the HTML5 elements? Right now I don't care for canvas, video etc. Just the block type ones.

Thanks, as right now it is way over my head.

Da Scritch
February 24, 2011 04:23AM

I think I will give a try to extend DOM rules for HTML5 elements, as my partners can't wait to caption , audio , video, aside everywhere (well, that depends of the tags i authorise them)

March 03, 2011 06:12PM

imo, option 1 is best. whilst it would be nice to get html5 support in, as you say html5 isn't fully spec'd up yet. but either way even with option2, if you completed before the html5 spec is fully implemented then you're still going to need to change things. once html5 spec is complete and they're ready to roll it out fully & browsers support it, then definitely option1 would be my choice.

June 16, 2011 07:03AM

I am looking forward to HTML5 support in this. Is there an ETA for HTML5 support? A lot of site are moving to HTML5 already and decent browser support is there in IE9, Firefox, Chrome, Safari, Opera etc.

June 16, 2011 07:13AM

No, there is not currently an ETA for HTML5 support.

August 12, 2014 12:41PM

I am still finding that there are things that completely finished with html5. I have been messing with the video tag and find it strange that it needs two types of videos files.

Edited 4 time(s). Last edit at 05/25/2018 11:19PM by zeplin.

January 12, 2015 11:27AM


I've been working on some changes to the drupal.org/htmlpurifier module in order to ensure that it can function with HTML5 tags.

In order to create a patch that works, I ended up forking the script found at:

https://github.com/kennberg/php-htmlpurfier-html5 (original)

https://github.com/lukusw/php-htmlpurfier-html5 (my version)

I needed to adapt the script to enable me to pass a preinitialised config object (rather than let the script create one afresh).

The Drupal module maintainer has asked me to enquire whether the script could be added to the main HTML Purifier codebase.

Would this be possible?

Thanks in advance


Sorry, you do not have permission to post/reply in this forum.