Welcome! » Log In » Create A New Profile

Config Caching for Speed

Posted by xathien 
Config Caching for Speed
March 22, 2013 01:31PM

I'll preface this by saying that we're probably using HTMLPurifier in a very nonstandard way. :) One example of a place we use this is in our Reporting system. We obviously want to allow our own scripts and custom HTML and such on the page, but any user-editable text on the page (of which there is much) we want to purify, right? We keep one HTMLPurifier instance around with one HTMLPurifier_Config, and then purify hundreds of strings, one-by-one, as we encounter them while parsing the report's definition. Simple, right? If there's a more preferred way of doing this, I'd certainly like to hear about it. :)

In the meantime, and as the hard-nosed optimizer on our Engineering team, I've done some profiling of how HTMLPurifier runs in our context. Since this page says I should report my findings, well...

So far I've optimized two things in HTMLPurifier_Config. Even though we use the same instance of the Config across all our strings (in my profiling examples, we're purifying 431 strings), each purify() call ends up calling HTMLPurifier_Config->getAll() twice. In the source code, it specifically says "This is a pretty inefficient function, avoid if you can". 863 calls to getAll() (resulting also in roughly 100k calls to php::explode) profiled at about 1.6 seconds cumulative execution. Since the Config is finalized once and never changed after, I put a quick cache on the front of the function that holds the $ret value of the function. This reduced getAll's overhead from 1.6s to about 6 milliseconds.

My other optimization is in the get() method. Every get() would result in several simple error checks, and then a recursive call to parent configs looking for the key in question. In my example, my 431 string purifications end up calling HTMLPurifier_Config->get() about 37k times, resulting in around 73k recursive calls and calls to ->has() on PropertyList. My profiles show this around 3.2s total time. Putting a small cache on the front of the get method reduces the number of recursive calls to about 5k, and the time footprint is reduced to about 329ms.

If anyone requests, I'll post the changes to my code. They're a bit hacked in and perhaps not up to HTMLPurifier standards, so maybe my overly-verbose explanation is sufficient. :)

Enjoy!

Re: Config Caching for Speed
March 22, 2013 03:02PM

Cool; you should submit some patches!

Re: Config Caching for Speed
March 22, 2013 04:28PM

I added one more cache location, but there might be some overlap with an existing cache. I wasn't sure exactly how your cache worked, so I basically ignored it in favor of my fast cache. :) They're very aggressive caches, so they might require an option to enable in the future. How do you prefer patch submissions? I'll just tack it in some pre tags here in the meantime... I'm very interested to see if it actually passes your regressions.

diff HTMLPurifier/Config.php
--- HTMLPurifier/Config.php	Fri Mar 22 14:03:40 2013 -0600
+++ HTMLPurifier/Config.php	Fri Mar 22 12:26:23 2013 -0600
@@ -87,6 +87,22 @@
     private $lock;
 
     /**
+     * A cache from the getAll function - Slightly more memory, but saves tons of CPU time when reusing config for multiple purifications.
+     */
+    private $allCache;
+
+    /**
+     * A cache for the get function - Slightly more memory, but saves tons of CPU time for fetching the same property multiple times. Especially useful for reusing config.
+     */
+    private $getCache;
+
+    /**
+     * A cache for the getDefinition function - Faster than the cache they set up
+     */
+    private $defCache;
+
+
+    /**
      * @param $definition HTMLPurifier_ConfigSchema that defines what directives
      *                    are allowed.
      */
@@ -146,6 +162,7 @@
      * @param $key String key
      */
     public function get($key, $a = null) {
+        if (isset($this->getCache[$key])) return $this->getCache[$key];
         if ($a !== null) {
             $this->triggerError("Using deprecated API: use \$config->get('$key.$a') instead", E_USER_WARNING);
             $key = "$key.$a";
@@ -170,7 +187,8 @@
                 return;
             }
         }
+        $ret = $this->getCache[$key] = $this->plist->get($key);
+        return $ret;
-        return $this->plist->get($key);
     }
 
     /**
@@ -220,12 +238,14 @@
      * @warning This is a pretty inefficient function, avoid if you can
      */
     public function getAll() {
+        if (!empty($this->allCache)) return $this->allCache;
         if (!$this->finalized) $this->autoFinalize();
         $ret = array();
         foreach ($this->plist->squash() as $name => $value) {
             list($ns, $key) = explode('.', $name, 2);
             $ret[$ns][$key] = $value;
         }
+        $this->allCache = $ret;
         return $ret;
     }
 
@@ -377,6 +397,10 @@
         if ($optimized && !$raw) {
             throw new HTMLPurifier_Exception("Cannot set optimized = true when raw = false");
         }
+        // Keep a stupid-simple index for exactly what we're looking for; single-array index speed!
+        $optionsIndex = $type.($raw?1:0).($optimized?1:0);
+        if (isset($this->defCache[$optionsIndex]))
+            return $this->defCache[$optionsIndex];
         if (!$this->finalized) $this->autoFinalize();
         // temporarily suspend locks, so we can handle recursive definition calls
         $lock = $this->lock;
@@ -392,10 +416,12 @@
                 $def = $this->definitions[$type];
                 // check if the definition is setup
                 if ($def->setup) {
+                    $this->defCache[$optionsIndex] = $def;
                     return $def;
                 } else {
                     $def->setup($this);
                     if ($def->optimized) $cache->add($def, $this);
+                    $this->defCache[$optionsIndex] = $def;
                     return $def;
                 }
             }
@@ -404,6 +430,7 @@
             if ($def) {
                 // definition in cache, save to memory and return it
                 $this->definitions[$type] = $def;
+                $this->defCache[$optionsIndex] = $def;
                 return $def;
             }
             // initialize it
@@ -414,6 +441,7 @@
             $this->lock = null;
             // save in cache
             $cache->add($def, $this);
+            $this->defCache[$optionsIndex] = $def;
             // return it
             return $def;
         } else {
@@ -445,6 +473,7 @@
             }
             // check if definition was in memory
             if ($def) {
+                $this->defCache[$optionsIndex] = $def;
                 if ($def->setup) {
                     // invariant: $optimized === true (checked above)
                     return null;
@@ -466,6 +495,7 @@
                     // save the full definition for later, but don't
                     // return it yet
                     $this->definitions[$type] = $def;
+                    $this->defCache[$optionsIndex] = $def;
                     return null;
                 }
             }
@@ -482,6 +512,7 @@
             // initialize it
             $def = $this->initDefinition($type);
             $def->optimized = $optimized;
+            $this->defCache[$optionsIndex] = $def;
             return $def;
         }
         throw new HTMLPurifier_Exception("The impossible happened!");
Re: Config Caching for Speed
March 22, 2013 07:36PM

It fails a bunch of tests:

All HTML Purifier tests on PHP 5.4.6-1ubuntu1.2
1) Identical expectation [String: H] fails with [String: Pu] at character 0 with [H] and [Pu] at [/srv/code/htmlpurifier/tests/HTMLPurifier/ConfigTest.php line 60]
	in testNormal
	in HTMLPurifier_ConfigTest
2) Identical expectation [String: hydrogen] fails with [String: plutonium] at character 0 with [hydrogen] and [plutonium] at [/srv/code/htmlpurifier/tests/HTMLPurifier/ConfigTest.php line 61]
	in testNormal
	in HTMLPurifier_ConfigTest
3) Identical expectation [Integer: 1] fails with [Integer: 94] because [Integer: 1] differs from [Integer: 94] by 93 at [/srv/code/htmlpurifier/tests/HTMLPurifier/ConfigTest.php line 62]
	in testNormal
	in HTMLPurifier_ConfigTest
4) Identical expectation [Float: 1.00794] fails with [Float: 244] because [Float: 1.00794] differs from [Float: 244] by 242.99206 at [/srv/code/htmlpurifier/tests/HTMLPurifier/ConfigTest.php line 63]
	in testNormal
	in HTMLPurifier_ConfigTest
5) Identical expectation [Boolean: false] fails with [Boolean: true] as [Boolean: false] does not match [Boolean: true] at [/srv/code/htmlpurifier/tests/HTMLPurifier/ConfigTest.php line 64]
	in testNormal
	in HTMLPurifier_ConfigTest
6) Identical expectation [Array: 3 items] fails with [Array: 2 items] as key list [1, 2, 3] does not match key list [238, 239] at [/srv/code/htmlpurifier/tests/HTMLPurifier/ConfigTest.php line 65]
	in testNormal
	in HTMLPurifier_ConfigTest
7) Identical expectation [Array: 3 items] fails with [Array: 3 items] with member [0] at character 1 with [nonmetallic] and [nuclear] at [/srv/code/htmlpurifier/tests/HTMLPurifier/ConfigTest.php line 66]
	in testNormal
	in HTMLPurifier_ConfigTest
8) Identical expectation [Array: 3 items] fails with [Array: 2 items] as key list [1, 2, 3] does not match key list [238, 239] at [/srv/code/htmlpurifier/tests/HTMLPurifier/ConfigTest.php line 67]
	in testNormal
	in HTMLPurifier_ConfigTest
9) Identical expectation [Object: of stdClass] fails with [Boolean: false] with type mismatch as [Object: of stdClass] does not match [Boolean: false] at [/srv/code/htmlpurifier/tests/HTMLPurifier/ConfigTest.php line 68]
	in testNormal
	in HTMLPurifier_ConfigTest
10) Identical expectation [String: Vandoren] fails with [String: Conn-Selmer] at character 0 with [Vandoren] and [Conn-Selmer] at [/srv/code/htmlpurifier/tests/HTMLPurifier/ConfigTest.php line 103]
	in testEnumerated
	in HTMLPurifier_ConfigTest
11) Identical expectation [String: brass] fails with [String: percussion] at character 0 with [brass] and [percussion] at [/srv/code/htmlpurifier/tests/HTMLPurifier/ConfigTest.php line 114]
	in testEnumerated
	in HTMLPurifier_ConfigTest
12) Identical expectation [String: brass] fails with [String: electronic] at character 0 with [brass] and [electronic] at [/srv/code/htmlpurifier/tests/HTMLPurifier/ConfigTest.php line 117]
	in testEnumerated
	in HTMLPurifier_ConfigTest
13) Identical expectation [String: brass] fails with [String: electronic] at character 0 with [brass] and [electronic] at [/srv/code/htmlpurifier/tests/HTMLPurifier/ConfigTest.php line 120]
	in testEnumerated
	in HTMLPurifier_ConfigTest
14) Identical expectation [String: B-] fails with [NULL] with type mismatch as [String: B-] does not match [NULL] at [/srv/code/htmlpurifier/tests/HTMLPurifier/ConfigTest.php line 137]
	in testNull
	in HTMLPurifier_ConfigTest
15) Identical expectation [Integer: 3] fails with [Integer: 999] because [Integer: 3] differs from [Integer: 999] by 996 at [/srv/code/htmlpurifier/tests/HTMLPurifier/ConfigTest.php line 161]
	in testAliases
	in HTMLPurifier_ConfigTest
Author:
Your Email:

Subject:

HTML input is enabled. Make sure you escape all HTML and angled brackets with < and >.

Auto-paragraphing is enabled. Double newlines will be converted to paragraphs; for single newlines, use the pre tag.

Allowed tags: a, abbr, acronym, b, blockquote, caption, cite, code, dd, del, dfn, div, dl, dt, em, i, ins, kbd, li, ol, p, pre, s, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, var.

For inputting literal code such as HTML and PHP for display, use CDATA tags to auto-escape your angled brackets, and pre to preserve newlines:

<pre><![CDATA[
Place code here
]]></pre>

Power users, you can hide this notice with:

.htmlpurifier-help {display:none;}

Message: