Welcome! » Log In » Create A New Profile

When to use HTML Purifier

Posted by McCarville 
When to use HTML Purifier
May 04, 2011 12:57PM

I am brand new to HTML Purifier and I am wondering when I should be using HTML purifier at different points in my application and when a utility like this would be considered over kill. For example in my app I have simple user input fields like usernames and passwords, and then more "involved" fields with CKEditor for user to enter large amounts of HTML content? I would assume that I should be using HTML Purifier for large amounts of HTML from a WYSIWYG like CKEditor, but should I use HTML Purifier for all user input fields like username or password? If HTML Purifier would be considered "overkill" for the those types of field does anyone have some suggestion as to what I should be doing with these fields to protect against injection and XSS attacks?

I apologize for any naivety in my post but I am still just learning, and I would also like to thank everyone in advance for your advice on this matter.

Regards, Mike Sloan

Re: When to use HTML Purifier
May 04, 2011 02:15PM

For the conceptual framework you need to make this judgment, see here: http://web.mit.edu/ezyang/Public/iap/intro-to-was.html

In short, if it's HTML, consider using HTML Purifier on it. If it's not, don't.

Re: When to use HTML Purifier
May 05, 2011 10:24AM

Thank you very much for all the information, the PPT was very interesting. Are you a professor at MIT?

I know the simple answer you gave was short and to the point, but it definitely lead me to an "aha moment" regarding when to use HTML Purifier, some time that most obvious answers escape the mind.

Regarding the sanitation of user input I have been reading about PDO and it seems like that is what you were using in your PPT. In the example below, assuming the value for $name is coming from a $_POST['txtName'], should something be done to information in the POST data or is it as simple as $name=$_POST['txtName']?

$sql = 'SELECT * FROM users WHERE name = ?' $sth = $dbh->prepare($sql); $sth->execute(array($name));

Once again thank you very much for taking the time to answer my previous question, and I would also like to thank you in advance for any additional insight you can provide.

Re: When to use HTML Purifier
May 05, 2011 01:06PM

Nah, I'm just an undergrad.

Just $_POST['txtName'] should be sufficient to get it into the "pure" form.

Re: When to use HTML Purifier
May 07, 2011 08:17AM

you can apply filters to the text fields.

when dealing with $_POST & $_GET use filter_input(INPUT_POST, $txt, FILTER_SANITIZE_STRING)

there are different filters for different data types such as INT, FLOAT, URL, EMAIL etc.

see http://www.php.net/manual/en/ref.filter.php for more info.

simply escaping the input is not enough. make sure your Charsets are correct too. it makes a difference if your application can be used with different character sets such as UTF-8, CHK etc.

it is recommended for example when using htmlspecialchars to also set the character set too.

ie. htmlspecialchars($var, ENT_QUOTES, 'utf-8')

setting the character set is another measure for securing your apps properly.

i see you're planning on using prepared statements which is great.

try to avoid the use of addslashes() it is not safe for escaping SQL as certain charsets can open it up to injection.

try to also avoid using $_REQUEST aswell because that can be manipulated by using a craftily designed cookie, $_COOKIE overrides the $_GET & $_POST in the $_REQUEST so it's possible to change the data.

Re: When to use HTML Purifier
May 16, 2011 01:31PM

Thanks for the input. Regarding the example I gave above this is what I have come up with, any input?

$name = filter_input(INPUT_POST, $txtName, FILTER_SANITIZE_STRING); $sql = 'SELECT * FROM users WHERE name = :name'; $sth = $dbh->prepare($sql); $sth->execute(array(':name' => $name));

Would this be considered properly escaped and sanitized? When would I need to use htmlspecialchars?

Re: When to use HTML Purifier
May 16, 2011 03:57PM

It looks like FILTER_SANITIZE_STRING does the equivalent of htmlentities, though I'm not 100% clear on its semantics.

Re: When to use HTML Purifier
May 16, 2011 06:28PM

Thanks for the input. Regarding the example I gave above this is what I have come up with, any input?

$name = filter_input(INPUT_POST, $txtName, FILTER_SANITIZE_STRING); $sql = 'SELECT * FROM users WHERE name = :name'; $sth = $dbh->prepare($sql); $sth->execute(array(':name' => $name));

Would this be considered properly escaped and sanitized? When would I need to use htmlspecialchars?

looks ok, but usage of filter_input is wrong.

filter_input(INPUT_POST, $txtName, FILTER_SANITIZE_STRING)

you should only use filter_input() when dealing with $_POST or $_GET etc.

example:

$_POST['txtName']

filter_input(INPUT_POST, 'txtName', FILTER_SANITIZE_STRING)

$_GET['txtName']

filter_input(INPUT_GET, 'txtName', FILTER_SANITIZE_STRING)

if you are dealing with a string. then you should use filter_var.

$txtName = $_POST['txtName']

filter_var($txtName, FILTER_SANITIZE_STRING)

Re: When to use HTML Purifier
May 17, 2011 10:35AM

I have to say I am a little confused... assuming that the page we are working with has received a $_POST['txtName'] would $name = filter_input(INPUT_POST, $txtName, FILTER_SANITIZE_STRING); be wrong? What I am aiming for is to filter the post data and assign it to the var $name, so I can use name in the execute array...

BTW as stated above, I very much appreciate you taking the time to engage in this discussion.

Re: When to use HTML Purifier
May 17, 2011 12:13PM

I think you're still a little confused, conceptually speaking. When data is passed around inside your application, in memory, it should be done in "as pure" a form as possible. If my name has a space in it, the strings my application handles should not have %20s in them.

So, the question here is not an operational one, it's a semantic one, and you haven't given enough information how to qualify this properly.

Re: When to use HTML Purifier
May 19, 2011 03:13PM

I have to say I am a little confused... assuming that the page we are working with has received a $_POST['txtName'] would $name = filter_input(INPUT_POST, $txtName, FILTER_SANITIZE_STRING); be wrong? What I am aiming for is to filter the post data and assign it to the var $name, so I can use name in the execute array...

BTW as stated above, I very much appreciate you taking the time to engage in this discussion.

it's wrong because you have used $txtName which is a $string variable. what is essentially happening is

INPUT_POST tells filter_input, that it is filtering a $_POST[] data in this case $_POST['txtName'] so strip off the $_POST and it becomes just 'txtName'.

$name = filter_input(INPUT_POST, 'txtName', FILTER_SANITIZE_STRING);

if you were to use a var for example if you had done $txtName = $_POST['txtName']; then you would need to use filter_var($txtName) because $txtName is a string variable not av $_POST or $_GET.

for a better insight see http://www.w3schools.com/PHP/php_ref_filter.asp

@Ambush, FILTER_SANITIZE_STRING will strip tags and *optionally* encode the variable. there is a specific filter for htmlspecialchars FILTER_SANITIZE_SPECIAL_CHARS this also has specific options you can use to only encode ascii above 32 or below, whichever you choose.

Sorry, you do not have permission to post/reply in this forum.