Welcome! » Log In » Create A New Profile

Automatic Purification of Link URIs

Posted by mstrofbass 
Automatic Purification of Link URIs
May 11, 2017 05:54PM

I think most of my trouble here is a fundamental misunderstanding of how HTMLPurifier works (probably because I was tossed into a project where it already implemented and hidden via a framework).

Essentially, we allow HTML comments and are using HTMLPurifier to purify them on display. The comments allow links but it appears the URIs are not being encoded to the degree we want (or at all). The particular example we were given was a URL such as http://www.example.com/alert(document.cookie). In our case, we need to encode the parentheses and any other character that may be a vulnerability. It was pointed out that this was not being done by default, so I implemented a solution using straight PHP to pull out the hrefs and perform some encoding on them. Since I don't know just how good the solution in, I started googling around and, via a couple of threads on here, tried to use the following:

	    $context = new \HTMLPurifier_Context();

	    $config = \HTMLPurifier_Config::createDefault();
	    $config->set('Core.Encoding', 'utf-8');
	    $config->set('HTML.Doctype', 'XHTML 1.0 Strict');
	    $config->set('HTML.Allowed', '&');
	    $config->set('Cache.SerializerPath', \Yii::$app->getRuntimePath());
	    $config->set('Cache.SerializerPermissions', 0775);
	    $purifier = new \HTMLPurifier_AttrDef_URI();

	    $newHREF = $purifier->validate($href, $config, $context);

This also appeared to do nothing on the URL http://www.example.com/alert('foo')/alert('bar').jpg?alert('baz')&alert(document.cookie)#alert('anchor').

What am I missing?

Re: Automatic Purification of Link URIs
May 12, 2017 11:27PM

Well... I don't think there's anything wrong with that URL?

Your Email:


HTML input is enabled. Make sure you escape all HTML and angled brackets with < and >.

Auto-paragraphing is enabled. Double newlines will be converted to paragraphs; for single newlines, use the pre tag.

Allowed tags: a, abbr, acronym, b, blockquote, caption, cite, code, dd, del, dfn, div, dl, dt, em, i, ins, kbd, li, ol, p, pre, s, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, var.

For inputting literal code such as HTML and PHP for display, use CDATA tags to auto-escape your angled brackets, and pre to preserve newlines:

Place code here

Power users, you can hide this notice with:

.htmlpurifier-help {display:none;}