Welcome! » Log In » Create A New Profile

HTMLPurifier Not Removing Empty Table (but does on 'Live Demo')

Posted by bbruman 
HTMLPurifier Not Removing Empty Table (but does on 'Live Demo')
May 14, 2017 04:16PM

I posted this on StackOverflow but then found these forums.

I'm really confused about why this isn't working.

I've tried both versions 4.9.2 and 4.8.0, both are keeping the empty table in tact.

Here's my HTML code block:

<table class="product-description-table">
<tbody>
<tr>
<td class="item" colspan="3">Test Title</td>
</tr>
<p class="MsoNormal c2"><strong>Test Paragraph 3</strong></p>
<p class="MsoNormal c2"><strong>Test Paragraph 2</strong></p>
<p class="MsoNormal c2"><strong>Test Paragraph 3</strong></p>
<p class="c5"></p>
<p class="MsoNormal c2"><strong>&nbsp;</strong></p>
<strong class="c6"><strong><em><br></em></strong></strong>
<p class="c2"></p>
<p class="c4"></p>
</td>
<td class="product-content-border"></td>
</tr>
<tr>
<td class="gallery" colspan="3">
<table>
<tbody>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>

Here's my PHP script:

https://pastebin.com/L3p6gyd6

And here's my link to the Live Demo

As you can see in the Live Demo, HTMLPurifier is successfully removing the empty table. (Alternatively, you can post my html block into the default Live Demo page to test.

My question is, why is my PHP script still leaving the table in tact.

Here is the output of my PHP script

<table class="product-description-table"><tbody><tr><td class="item" colspan="3">Test Title</td>
</tr></tbody></table><p class="MsoNormal c2"><strong>Test Paragraph 3</strong></p>
<p class="MsoNormal c2"><strong>Test Paragraph 2</strong></p>
<p class="MsoNormal c2"><strong>Test Paragraph 3</strong></p>


<strong class="c6"><strong><em><br /></em></strong></strong>






<table><tbody><tr><td></td>
<td></td>
</tr><tr><td></td>
<td></td>
</tr><tr><td></td>
<td></td>
</tr><tr><td></td>
<td></td>
</tr><tr><td></td>
<td></td>
</tr><tr><td></td>
<td></td>
</tr><tr><td></td>
<td></td>
</tr><tr><td></td>
<td></td>
</tr></tbody></table>

I'm looking to use HTMLPurifier to remove empty tables like this from these improper html blocks, and it's frustrating that I can't get it to work (even though, to my knowledge, I am using the same settings in the Live Demo for my PHP script).

Is this a bug?

Or does anyone know what the issue is in my PHP script to whereas it's not removing the empty table as it should?

Re: HTMLPurifier Not Removing Empty Table (but does on 'Live Demo')
May 14, 2017 04:35PM

Okay, so after making this thread I decided to do a little bit more testing with different versions of HTMLPurifier.

I went down the list...

require_once &#039;/htmlpurifier-4.9.2/library/HTMLPurifier.auto.php&#039;;

then...

require_once &#039;/htmlpurifier-4.8.0/library/HTMLPurifier.auto.php&#039;;

then...

require_once &#039;/htmlpurifier-4.7.0/library/HTMLPurifier.auto.php&#039;;

then...

require_once &#039;/htmlpurifier-4.6.0/library/HTMLPurifier.auto.php&#039;;

then FINALLY.....

require_once &#039;/htmlpurifier-4.5.0/library/HTMLPurifier.auto.php&#039;;

removed the table in question and works properly.

So, for whatever reason, version 4.5.0 removes the table properly in my test case. Any version higher keeps the empty table in place.

Like I said, not sure why this is.. seems like a bug! I guess I'll just use version 4.5.0 till this maybe gets worked out!

Re: HTMLPurifier Not Removing Empty Table (but does on 'Live Demo')
May 14, 2017 04:42PM

By any chance can you post the phpinfo() of the server you are running the code on?

Re: HTMLPurifier Not Removing Empty Table (but does on 'Live Demo')
May 14, 2017 08:38PM

Yeah sure.

Not sure exactly the best way to share this. Saved it as a PDF and uploaded it here let me know if that will suffice:

http://docdro.id/tc1NoZd

You think it may be a server issue? This is all being done on my localhost WAMP server. I can try on a live server to see if I get the same output for newer versions if that helps..

Re: HTMLPurifier Not Removing Empty Table (but does on 'Live Demo')
May 14, 2017 08:55PM

Actually, the fix to your problem is pretty simple: add

$config->set('AutoFormat.RemoveEmpty.Predicate', array());

Look at http://htmlpurifier.org/live/configdoc/plain.html#AutoFormat.RemoveEmpty.Predicate for more details

(It's hard to see but you've applied this config setting in the demo file.)

Re: HTMLPurifier Not Removing Empty Table (but does on 'Live Demo')
May 15, 2017 10:37AM

Thank you.

Trying again with htmlpurifier-4.9.2

Still can't get it to work.

Trying as you have it sends off a php trigger_error

Warning: Value for AutoFormat.RemoveEmpty.Predicate is of invalid type, should be hash in C:\wamp64\www\cs\htmlpurifier-4.9.2\library\HTMLPurifier\Config.php

I've also tried putting the table elements that are supposed to be removed in there... this doesn't trigger a warning and goes through just fine, but still the table remains

$config->set(&#039;AutoFormat.RemoveEmpty.Predicate&#039;, [
        &#039;table&#039; =>
            [],
        &#039;tbody&#039; =>
            [],
        &#039;tr&#039; =>
            [],
        &#039;td&#039; =>
            []
    ]);

What do ya think?

Thanks again

Re: HTMLPurifier Not Removing Empty Table (but does on 'Live Demo')
May 15, 2017 12:15PM

Got this worked out!

Had to add:

$config->set(&#039;AutoFormat.RemoveEmpty.Predicate&#039;, [
    &#039;table&#039; =>
        []
]);

to my $config settings and now it's removing the table (most recent version 4.9.2)

Thanks for the support! If there's anything else I should be aware of let me know.

Re: HTMLPurifier Not Removing Empty Table (but does on 'Live Demo')
May 15, 2017 08:01PM

Hmm, I guess we have a bug where we don't think empty arrays are hashes. Should be an easy fix.

Re: HTMLPurifier Not Removing Empty Table (but does on 'Live Demo')
May 16, 2017 10:07AM

Glad I could help find a bug ;)

And thank you for all the effort into HTMLPurifier, it's a great piece of work and saves me a lot of time!

Re: HTMLPurifier Not Removing Empty Table (but does on 'Live Demo')
May 16, 2017 08:53PM

I'm looking through the documentation trying to get a better understanding of how HTMLPurifier works...

it removed the empty table in that one instance, but I'm doing some more testing and have something like this

<table><tbody><tr><td><strong>Title<br /></strong> text text text text <br /><br />
 text text text text <br /><br />
 text text text text <br /><br />
 text text text text <br /><br /></td>
</tr></tbody></table><p><br /></p>

<table><tr><td><br /><br /></td>
</tr></table>

In this instance, it's not removing the table in the Live Demo.

And is also not removing the empty table in my PHP script using

$config->set(&#039;AutoFormat.RemoveEmpty.Predicate&#039;, [
    &#039;table&#039; =>
        []
]);

as we discussed.

Am I missing something..? My example code seems like a dead simple example of an empty table, yet that nor

$config->set(&#039;AutoFormat.RemoveEmpty&#039;, true);

is removing it.

Here's a link to my PHP script:

https://pastebin.com/nkJLexFS

Re: HTMLPurifier Not Removing Empty Table (but does on 'Live Demo')
May 16, 2017 08:58PM

In your new example, it's not removed because a br tag is never considered empty (if it were, we would always remove them.) That means the table has content, so it's not removed. There might be a way to modify RemoveEmpty to treat this case differently, but this definitely is getting into the realm of "you need to write some code."

Re: HTMLPurifier Not Removing Empty Table (but does on 'Live Demo')
May 16, 2017 09:08PM

That makes sense. I hadn't considered that. I will try and find a work-around thanks for the quick reply!

Re: HTMLPurifier Not Removing Empty Table (but does on 'Live Demo')
May 16, 2017 09:12PM

If you want to try fixing this yourself, look at this code:

        for ($i = count($this->inputZipper->back) - 1; $i >= 0; $i--, $deleted++) {
            $next = $this->inputZipper->back[$i];
            if ($next instanceof HTMLPurifier_Token_Text) {
                if ($next->is_whitespace) {
                    continue;
                }
                if ($this->removeNbsp && !isset($this->removeNbspExceptions[$token->name])) {
                    $plain = str_replace("\xC2\xA0", "", $next->data);
                    $isWsOrNbsp = $plain === &#039;&#039; || ctype_space($plain);
                    if ($isWsOrNbsp) {
                        continue;
                    }
                }
            }
            break;
        }

This tests a token and makes a decision whether or not it is "ignorable" or not. If you add a case that matches for brs you probably can make HTML Purifier delete those too.

Re: HTMLPurifier Not Removing Empty Table (but does on 'Live Demo')
May 17, 2017 01:27PM

Thank you :)

I will test this out later tonight.

Author:
Your Email:

Subject:

HTML input is enabled. Make sure you escape all HTML and angled brackets with &lt; and &gt;.

Auto-paragraphing is enabled. Double newlines will be converted to paragraphs; for single newlines, use the pre tag.

Allowed tags: a, abbr, acronym, b, blockquote, caption, cite, code, dd, del, dfn, div, dl, dt, em, i, ins, kbd, li, ol, p, pre, s, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, var.

For inputting literal code such as HTML and PHP for display, use CDATA tags to auto-escape your angled brackets, and pre to preserve newlines:

<pre><![CDATA[
Place code here
]]></pre>

Power users, you can hide this notice with:

.htmlpurifier-help {display:none;}

Message: