Allowing SCRIPT tag from whitelisted SRC?
June 21, 2010 12:49PM

Hey all --

I have a question regarding HTMLPurifier that a thorough read of the (very good, btw) docs has not helped me find a solution to. So I figured I would turn to the experts :D

I have a client whose CMS is using HTMLPurifier to filter user-generated content. They are interested in allowing users to include widgets provided by a third party, who provides users with an "embed code" for the widgets that employs the SCRIPT tag, with the SRC attribute pointing to a JavaScript file hosted by the third party. (In some ways therefore this is similar to the problem of embedding videos from sites like YouTube.)

My thought was that it would be OK to allow this as long as SCRIPT is strictly set to only be included if the SRC is pointing to a valid, whitelisted domain that we know belongs to the third party. But the problem I've run into in implementing this is that I have not been able to find a way to get HTMLPurifier to allow SCRIPT tags _at all_ except by setting %HTML.Trusted to true. (I tried adding SCRIPT to the list of tags in %HTML.AllowedElements and adding the Scripting module via %HTML.AllowedModules, but neither allowed the tag through.)

Setting %HTML.Trusted to true appears to pass the tag through, but this approach makes me nervous because turning Trusted on seems to relax a bunch of other constraints as well, so using it to solve this problem feels a bit like swatting a fly with a shovel.

I can't do anything with whitelisting by SRC until I can get HTMLPurifier to allow SCRIPT tags through, though. So my question is: is there a way to solve this problem without resorting to %HTML.Trusted?

Thanks in advance for any help!

-- Jason Lefkowitz

Re: Allowing SCRIPT tag from whitelisted SRC?
June 21, 2010 07:17PM

You shouldn't be using %HTML.Trusted anyway; it really ought to be named %HTML.Unsafe or something. Even your scheme makes me uncomfortable, because if the external website allows any user-submitted text, it's not too difficult to hoodwink a browser into thinking an HTML page is to be interpreted as a script.

I'd probably recommend generating some sorts of stubs, and then converting the stubs into JavaScript using a client-side script in the browser.

Re: Allowing SCRIPT tag from whitelisted SRC?
July 19, 2010 09:41AM

I would also like to see a solution to allow script tags. I'm using HTMLPurifier to validate some html stubs our framework generates, and some of them have script tags (written by our developers). The problem is that HTMLPurifier removes the script tags. What is the best way to allow script tags?

Re: Allowing SCRIPT tag from whitelisted SRC?
July 19, 2010 01:06PM

In those situations, various people have had much more success running the code that generates the stubs after HTML Purifier.

Re: Allowing SCRIPT tag from whitelisted SRC?
July 19, 2010 01:13PM

The components are php classes that generate HTML using smarty as a template engine, so I can't really test the markup until after the component has run. I have unit tests that test various parts of the components, but I would also like to test the markup they produce.

Re: Allowing SCRIPT tag from whitelisted SRC?
July 19, 2010 01:17PM

The other way is to use a method like where you convert the script tags into tags that HTML Purifier will allow, and then convert them back when you’re done.

