Welcome! » Log In » Create A New Profile

Confused about how to use HTML Purifier

Posted by zoszsoz 
Confused about how to use HTML Purifier
June 16, 2011 08:16AM

Hi,

I've been at this for two nights now and still can't figure out how to run the HTMLPurifier. I've got it installed and did a simple:

$purifier = new HTMLPurifier();
$clean_html = $purifier->purify($str);

That works fine but is there an easy to follow Getting Started type guide with a few examples around somewhere? I've read a bunch of the documentation, but not sure exactly where to start or how I'm supposed to implement the use cases below?

So I've got a couple of use cases:

1) I have a plain text field, say for a First Name or Last Name - User submits the form - Before this is saved to the database, I want to trim any whitespace off the front and end of the string and strip all HTML tags from the input. It's a name text field, they shouldn't have any HTML in it. I'm thinking I could just have a whitelist of characters, like do they really need characters like ! and ? or < > in their name?

2) When outputting the contents from the previously saved database record (or from the previously submitted form) to the web page I want to thoroughly clean it for XSS and probably that would require converting all the characters to HTML, using something like htmlentities.

I was thinking something along the lines of:

$encoding = mb_detect_encoding($input);
$output = mb_convert_encoding($input, 'UTF-8', $encoding);
$output = htmlentities($output, ENT_QUOTES, 'UTF-8');
echo $output;

But I'm guessing your software does a more thorough job of XSS protection and is more secure. So how would I implement those two use cases with your software? Or should I be using your software only if I have a GUI HTML editor like CKEditor and want to filter the HTML?

Many thanks

Re: Confused about how to use HTML Purifier
June 16, 2011 08:36AM
That works fine but is there an easy to follow Getting Started type guide with a few examples around somewhere?

The INSTALL document at the base of your HTML Purifier document.

I have a plain text field, say for a First Name or Last Name - User submits the form - Before this is saved to the database, I want to trim any whitespace off the front and end of the string and strip all HTML tags from the input. It's a name text field, they shouldn't have any HTML in it. I'm thinking I could just have a whitelist of characters, like do they really need characters like ! and ? or < > in their name?

trim, striptags, and then htmlentities. Though I don't really approve of a scheme like that. You don't need HTML Purifier, anyway, though.

When outputting the contents from the previously saved database record (or from the previously submitted form) to the web page I want to thoroughly clean it for XSS and probably that would require converting all the characters to HTML, using something like htmlentities.

Don't need HTML Purifier for that either.

Here's the litmus test: if a user submits <b>foo</b>, should it show up as <b>foo</b> or foo? If the former, you don't need HTML Purifier.

dfgsdfg
Re: Confused about how to use HTML Purifier
October 17, 2012 10:38AM

hello

Author:
Your Email:

Subject:

HTML input is enabled. Make sure you escape all HTML and angled brackets with &lt; and &gt;.

Auto-paragraphing is enabled. Double newlines will be converted to paragraphs; for single newlines, use the pre tag.

Allowed tags: a, abbr, acronym, b, blockquote, caption, cite, code, dd, del, dfn, div, dl, dt, em, i, ins, kbd, li, ol, p, pre, s, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, var.

For inputting literal code such as HTML and PHP for display, use CDATA tags to auto-escape your angled brackets, and pre to preserve newlines:

<pre><![CDATA[
Place code here
]]></pre>

Power users, you can hide this notice with:

.htmlpurifier-help {display:none;}

Message: