Welcome! » Log In » Create A New Profile

Memory error

Posted by atDev 
Memory error
March 02, 2011 01:15PM

We get a memory exhausted error using the standalone version.

It reports the memory was used up on this code block:

foreach ($elements as $i => $x) {
$elements[$i] = true;
if (empty($i)) unset($elements[$i]); // remove blank
}

On this line:

$elements[$i] = true;

Any ideas on things to check?

Re: Memory error
March 02, 2011 05:00PM

What is your memory limit?

Re: Memory error
March 02, 2011 06:59PM

Memory limit is currently at 32MB.

It seems to happen when I enable SafeObject and FlashCompat.

Re: Memory error
March 03, 2011 12:23PM

That's unusual. Can you give a sample test script which displays this exhaustion?

Re: Memory error
March 03, 2011 12:26PM

Hello,

Well we implemented it "application wide" for security purposes.

I read your documentation on speeding up HTML Purifier and I got a general sense that you do not recommend running "purify" on every output variable? I understand this may cause a slow down, but would it also cause this memory error?

I made sure that we only create one HTML purifier object and use it for all the purification.

Re: Memory error
March 03, 2011 12:28PM

Though HTML Purifier is kind of slow and uses lots of memory, its steady-state memory usage should not be more than 1M (I can, for example, run the test suite no problem with only 1M of memory). So if you are indeed seeing 32M being used up, something is off, or it's not HTML Purifier's fault (but we just happened to try to grab the last bit of memory :-)

Re: Memory error
March 03, 2011 12:37PM

I did a simple test in the application.

With htmlpurifier+flashcompat+safeobject memory usage of a sample page in the application was 26.9M.

With htmlpurifier without those settings enabled, memory usage was 8.7M.

Is there any type of debug code I can enable or anything I can check?

Keep in mind this is with multiple calls to the purifier method. Here is our config:

$this->config = HTMLPurifier_Config::createDefault();
        $this->config->set('Core', 'Encoding', CHARSET);
        $this->config->set('HTML', 'Doctype', 'HTML 4.01 Transitional');
        $this->config->set('Cache', 'DefinitionImpl', null);
        $this->config->set('HTML', 'TidyLevel', 'none');
        //$this->config->set('Output','FlashCompat',true);
        //$this->config->set('HTML.SafeObject', true);
        $this->config->set('HTML', 'AllowedElements', null);
        $this->config->set('HTML', 'AllowedAttributes', null);
Re: Memory error
March 03, 2011 12:39PM

A memory profile or a reproduceable test case would probably be best.

Re: Memory error
March 03, 2011 03:19PM
Well we implemented it "application wide" for security purposes.

exactly what do you mean by this & how are you using it?

an example of the HTML code that you are trying to purify will help to narrow it down.

I interpret your post as though you are purifying the whole page sourcecode itself before it's outputted to the browser.

Re: Memory error
March 03, 2011 03:28PM

Hello,

We purify data on output, rather than when being put into the database.

But we only purify data which we know or expect to maybe have HTML in it. So no, not the entire page or EVERY variable, only certain ones. Only suspected HTML data.

Re: Memory error
March 03, 2011 03:49PM

ok :)

an example of the html that causes the memory jump would be required to see if we can replicate the issue.

personally from using the standalone & full package myself, i have found the standalone version to be less of a resource hog on the server.

our CMS uses it for HTML content, & i haven't noticed this increase when we use safeobject & flashcompat on our system, though we did have to introduce a minimum memory spec on our newer versions which 16mb was enough. but we kept getting errors in some instances when using 8mb. with more tweaking and code restructuring in our latest cms which has reduced a lot of SQL queries etc, we have now got the minimum required down to about 10mb.

but i've never experienced anyone having an issue like that when they have above 16mb limits.

incidentally though,

$this->config->set('HTML', 'AllowedElements', null);
        $this->config->set('HTML', 'AllowedAttributes', null);

are you actually allowing any attributes/elements or are they null all the time? i've never tried using null in allowedElements directive like that.

Re: Memory error
March 03, 2011 03:54PM

In some areas we allow certain tags, in others we don't. It depends on the applications settings.

We are using the standalone version as well.

We are working on producing a memory profile and will post it here.

Re: Memory error
March 03, 2011 04:32PM

Our profiler is showing these two lines:

$elements[$i] = true;
if (empty($i)) unset($elements[$i]); // remove blank

Inside the constructor of this class: class HTMLPurifier_ChildDef_Required extends HTMLPurifier_ChildDef

Are getting hit 28041 times. This seems like an awful lot?

As far as timings go these are the second largest time consumer in our application other than the inclusion of the HTMLPurifier.standalone.php file.

The third largest is:

if ($required = (strpos($def_i, '*') !== false)) {

In:

public function expandIdentifiers(&$attr, $attr_types) {

In class HTMLPurifier_AttrCollections.

Does this help at all?

Re: Memory error
March 03, 2011 05:51PM

It's been a while since I've last profiled HTML Purifier, so it could very well be an inefficiency. However, constructing HTML definitions is pretty resource intensive work and we cache the results, so 28041 doesn't actually seem that large.

However, it seems to me that you are instantiating a lot of HTMLDefinitions. How many different configurations are you using, and is your caching working?

Re: Memory error
March 03, 2011 05:58PM

not sure. i'm leaning towards something other than purifier itself, maybe the method in which purifier is implemented in your script.

Re: Memory error
March 03, 2011 06:24PM

It's been a while since I've last profiled HTML Purifier, so it could very well be an inefficiency. However, constructing HTML definitions is pretty resource intensive work and we cache the results, so 28041 doesn't actually seem that large.

However, it seems to me that you are instantiating a lot of HTMLDefinitions. How many different configurations are you using, and is your caching working?

We technically have 2 configurations. One is no tags/attributes allowed (null). The other is a fairly small subset of HTML tags/attributes. The set we profiled with is smaller than the set you allow here on the forums.

For caching when exactly should we be clearing the cache? Any time our allowed tags/attributes change? It doesn't cache the filtered HTML does it?

I fixed some caching issues and this dramatically dropped those two lines of code.

Now the line of code eating the most time is:

return unserialize(file_get_contents($file));

Which looks to be from the serializer because I fixed caching. I assume not much can be done about that.

However this line is next in line for speed:

list($ns, $key) = explode('.', $name, 2);

In HTMLPufifier.standalone.php

I notice in the comments you say:

/**
     * Retrieves all directives, organized by namespace
     * @warning This is a pretty inefficient function, avoid if you can
     */

How can this be avoided? Checking the code I don't see how it can be avoided?

Re: Memory error
March 03, 2011 06:26PM

On another note, all of the above is with SafeObject/FlashCompat turned OFF. I enabled it just now and no significant changes in timings. Only memory (about same as before). Any idea on a good memory profiler? xdebug?

Re: Memory error
March 03, 2011 06:28PM

For most people, taking that performance hit for convenience is worth it. However, what theoretically may be possible (not implemented) is figure out what hash code your configuration turns into, and use that instead of manually determining it based on the configuration.

Re: Memory error
March 03, 2011 06:34PM

By default, your cache is stored in library/HTMLPurifier/DefinitionCache/Serializer/HTML. Let me know how many files are there; it will give me a sense for how many different HTML definitions you are using.

Not really sure about memory profiler; I recall using xdebug with some success in the past.

Re: Memory error
March 04, 2011 12:27PM

Hi,

Cache has two folders. HTML URI

URI has one file. HTML has 4 files.

The timing issue was above was just for your reference. And mainly because your comments on that function said "This is a pretty inefficient function, avoid if you can". Which made me think there was something I could do to avoid it, but it appears to be essential for the operation of htmlpurifier.

As far as the main issue goes (memory) it seems that every single call I make to the purify method increases the memory a little bit. It is as if something is getting stored in class variable which never gets reset or cleared for the next call to purify. Any ideas? I guess I could always destroy the object after every call to purify and create a new one but this seems inefficient as well.

Re: Memory error
March 04, 2011 01:29PM

Hi,

Please ignore for now while we do some testing.

Re: Memory error
March 18, 2011 01:12AM

Hi,

Going back to this... we no longer have issues with the timings, just the huge memory consumption.

It is narrowed down to this line:

$this->config->set('HTML','SafeObject', true);

If we comment out these lines memory consumption is around 9.1mb.

If we turn on:

$this->config->set('HTML','SafeObject', true);

We hit the 32mb limit we have in place for testing.

Any ideas on what to check?

Re: Memory error
March 18, 2011 02:10AM

I ended up saving each configuration object in a class variable which is an array.

I did this after reading: http://htmlpurifier.org/phorum/read.php?3,4718

Where you mentioned: It is also likely that if you are using multiple configurations, they are all being loaded into memory and not being freed in interest of keeping the purifier “hot”; you could try reconducting the test with another random HTML directive to see if this is the case.

We still save each configuration in the class variable so it still eats up memory but for some reason once we did this the memory stopped getting gobbled up. Really don't have an explanation for it as our code before did not really do much different.

To confirm is this the best way to get the config object? <pre><![CDATA[ $config = HTMLPurifier_Config::createDefault();]]></pre>

Re: Memory error
March 18, 2011 07:57AM

i usually use >

$purifier = new HTMLPurifier($config);
$html = $purifier->purify($html);

where $config is an array of $config objects.

Re: Memory error
March 18, 2011 12:02PM

i usually use >

$purifier = new HTMLPurifier($config);
$html = $purifier->purify($html);

where $config is an array of $config objects.

I wasn't aware you could pass $config in and have it be an array of more than one config objects.

I will try this.

Re: Memory error
March 18, 2011 05:25PM

I'm still not really sure; it shouldn't matter how you supply the config because once it gets used to generate caches, the configuration is finalized: you can't make changes to it anymore.

Author:
Your Email:

Subject:

HTML input is enabled. Make sure you escape all HTML and angled brackets with < and >.

Auto-paragraphing is enabled. Double newlines will be converted to paragraphs; for single newlines, use the pre tag.

Allowed tags: a, abbr, acronym, b, blockquote, caption, cite, code, dd, del, dfn, div, dl, dt, em, i, ins, kbd, li, ol, p, pre, s, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, var.

For inputting literal code such as HTML and PHP for display, use CDATA tags to auto-escape your angled brackets, and pre to preserve newlines:

<pre><![CDATA[
Place code here
]]></pre>

Power users, you can hide this notice with:

.htmlpurifier-help {display:none;}

Message: