Pound signs removed
December 19, 2011 07:46AM

$desc = html_entity_decode($_POST['desc']);

require_once './HTMLPurifier.standalone.php';

$config = HTMLPurifier_Config::createDefault();

$config->set('Core.Encoding', 'utf-8');

$config->set('Core.EscapeNonASCIICharacters', true);

$config->set('HTML.Allowed', 'p,span,em,ul,ol,li');

$config->set('AutoFormat.RemoveEmpty', 'true');

$purifier = new HTMLPurifier($config);

$desc = $purifier->purify($desc);

Using this code pound signs are removed - where am I going wrong?

Re: Pound signs removed
December 19, 2011 11:03AM

html_entity_decode is probably doing the wrong thing. Can you remove that line?

Re: Pound signs removed
December 20, 2011 04:12AM

I have tried htmlspecialchars_decode instead and this seems to work. This step is required as the editor I'm using outputs encoded text. Many thanks

Re: Pound signs removed
December 20, 2011 12:52PM

OK. Can you post the contents of $desc before and after purification?

Re: Pound signs removed
February 17, 2012 05:14AM

Due to my experience, I consider html_entity_decode() and htmlspecialchars_decode() signs that the code is doing something it should not; I hope you don't mind me explaining why in your topic, DavidIanWaters, I can't guarantee it's valid for your case, but hear me out:

When you use a JavaScript editor to edit HTML, and you want to load pre-existing HTML into said editor, you should be doing it like this:

<textarea id="editor"><?php echo htmlspecialchars($htmlToEdit, ...); ?></textarea>

Reason: Even ignoring that you obviously don't want anyone breaking out of your editor textarea by supplying </textarea>, what you want between your <textarea>-tags is plaintext. Imagine the editor isn't being loaded. You want to see the HTML source, right? So you want to treat your data as plain text - and you're outputting it into HTML, so you need to escape it like you would any other plain text.

The editor will take this plaintext and interpret it as HTML once more (which is where things get confusing for a lot of developers). For this, it doesn't need to decode the text any more than you would need to turn a &gt; into > by hand. It sees what you would.

Now, when a browser triggers a form send, it will send the plaintext. This is more obvious if you consider a normal input field:

<input name="foo" type="text" value="5 &gt; 4" />

After the form is sent, this will arrive server-side as $_REQUEST: array('foo' => '5 > 4') without that you need to decode it first. The browser sending the form has already decoded it for you.

The exact same behaviour is true for:

<textarea name="foo">5 &gt; 4</textarea>

This, too, will arrive in your script as $_REQUEST: array('foo' => '5 > 4') without that you need to touch it with a decode.

So... if you are decoding after you've gotten anything from the browser, please carefully analyse what you are doing.

If you sanitise your HTML after you erroneously decode it, of course you're still safe from XSS and other awful things when you output it again :) but chances are that you're breaking the document structure in some way.

Please reconsider that call:

  1. Do you really need it? What is it doing and why is it doing it?
  2. Which part of your application breaks if you take it out entirely? I can almost guarantee you that the part forcing you to use that line (providing such a part exists) is doing something wrong, and you would be better off fixing it there.

(Edit: Fixed formatting after an HTML escaping issue ravaged the forum.)

Edited 1 time(s). Last edit at 07/30/2012 01:55PM by pinkgothic.

Author:
Your Email:

Subject:

HTML input is enabled. Make sure you escape all HTML and angled brackets with &lt; and &gt;.

Auto-paragraphing is enabled. Double newlines will be converted to paragraphs; for single newlines, use the pre tag.

Allowed tags: a, abbr, acronym, b, blockquote, caption, cite, code, dd, del, dfn, div, dl, dt, em, i, ins, kbd, li, ol, p, pre, s, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, var.

For inputting literal code such as HTML and PHP for display, use CDATA tags to auto-escape your angled brackets, and pre to preserve newlines:

<pre><![CDATA[
Place code here
]]></pre>

Power users, you can hide this notice with:

.htmlpurifier-help {display:none;}

Message: