Welcome! » Log In » Create A New Profile

Purifying a URL

Posted by diamondf 
Purifying a URL
November 10, 2013 01:30PM

I've had success using HTMLPurifier to sanitize HTML, but I can't seem to use it to properly strip a URL.

I assume I'm supposed to be using HTMLPurifier_AttrDef_URI(). The problem is that I have no idea what $context is supposed to be set to. I've looked through the documentation, and all it says is:

$context	Mandatory HTMLPurifier_AttrContext object. 

A few dozen google searches and code reviews later and I'm still no clearer on what that's supposed to be.

My code looks like this:

$config = HTMLPurifier_Config::createDefault();
$purifier = new HTMLPurifier_AttrDef_URI();
$clean_html = $purifier->validate($unsafeHTML, $config, $context);

That gives me: Fatal error: Call to a member function register() on a non-object in __FILE__ __LINE__

Can anyone explain what I'm supposed to set $context to in order to get that to work?

Re: Purifying a URL
November 10, 2013 02:08PM

That's a typo; it should be HTMLPurifier_Context. I don't remember exactly what context variables need to be set; try it and see which missing ones HTML Purifier complains about.

Re: Purifying a URL
November 10, 2013 02:28PM

Aha! You were correct. HTMLPurifier_Context worked, HTMLPurifier_AttrContext was broken.

It may be worth someone looking over:

http://htmlpurifier.org/doxygen/html/classHTMLPurifier__AttrDef__URI.html#a25866f9117d08ef97a81d86662edc1fb

Unless I'm mistaken, that documentation lead me to an incorrect conclusion. That said, I'm very new to HTML Purifier.

Thank you, Ambush!

Re: Purifying a URL
November 10, 2013 02:31PM

Hmm, I bet those docs need updating.

Debashish Kumar
Re: Purifying a URL
February 19, 2018 10:10PM

I cannot able to use it for purifying URL as it converts non-english urls to ntilde;, atilde; and amp; kind of things.

Author:
Subject:

HTML input is enabled. Make sure you escape all HTML and angled brackets with < and >.

Auto-paragraphing is enabled. Double newlines will be converted to paragraphs; for single newlines, use the pre tag.

Allowed tags: a, abbr, acronym, b, blockquote, caption, cite, code, dd, del, dfn, div, dl, dt, em, i, ins, kbd, li, ol, p, pre, s, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, var.

For inputting literal code such as HTML and PHP for display, use CDATA tags to auto-escape your angled brackets, and pre to preserve newlines:

<pre><![CDATA[
Place code here
]]></pre>

Power users, you can hide this notice with:

.htmlpurifier-help {display:none;}

Message: