Welcome! » Log In » Create A New Profile

Code improvement for speed

Posted by patnaik 
Code improvement for speed
July 08, 2007 03:56PM

Though I have not looked into the strategy HTML Purifier uses, a quick look at some of its code suggests that there are many points where the PHP can be optimized. Done together, such optimizations may yield significant improvements. Here are some of them:

1. HTMLPurifier.php has atleast 9 require_once() calls. Can require_once() be changed to require()? Can some of the 9 files be merged?

2. Loops like 'for ($i = 0, $size = count($this->filters); $i < $size; $i++)' can be speeded-up by moving the count() outside.

3. Instead of concatenating strings, can one do the much faster ob_start and ob_get_contents?

4. Instead of calls to small-sized functions, can the code be used directly? E.g.,

'$html = preg_replace('!<body[^>]*>(.+?)</body>!is', '$1', $html);'

can replace:

function extractBody($html) {
  $matches = array();
  $result = preg_match('!<body[^>]*>(.+?)</body>!is', $html, $matches);
  if ($result) {
    return $matches[1];
  } else {
    return $html;
  }
}

5. Can regular expressions be avoided? E.g., even instead of the better preg_replace() stated above, using this apparently complex code speeded the evaluation atleast a 100-fold :

$result = (($c = strpos($d=substr($result, ($a = strpos($result, '<body')) + 5, ($b = strpos($result, '</body>'))-$a-5), '>')) and $a !== false and $b !== false) ? substr($d, $c+1): $result;
Re: Code improvement for speed
July 09, 2007 08:31AM

These are valiant suggestions, but a lot of them are not applicable.

1. Opcode caching should manage this, although I should be looking into smooshing all the files into one giant include

2. Look carefully: the count() call is in the initialization statement, not the repeated statement

3. Mostly inapplicable, since the HTML representation isn't converted into actual text until the very end.

4. The regexp is only used once and thus is very "cheap". More dangerous regexps are the ones that are called repeatedly, such as the one in URI.

5. Once again, in that case, the regexp call is cheap compared to the rest of the code, and is only called when a tag is detected.

Re: Code improvement for speed
July 09, 2007 09:42AM

Yes, that was a wrong example to give re: loop optimization.

The point I am making, and which you anyway must have kept in mind, is that all possible optimizations should be looked into.

Also, optimizations ideally should center around core PHP. E.g., a significant subset of PHP implementations may not be using opcode caches.

Re: Code improvement for speed
July 09, 2007 04:55PM
Yes, that was a wrong example to give re: loop optimization.

While on the subject, however, I am curious to find out how costly it is to be continually calling isset in a loop.

The point I am making, and which you anyway must have kept in mind, is that all possible optimizations should be looked into.

Yes, but the big problems should be dealt with first. I haven't profiled HTML Purifier in a while, and it's high time I do so again. Also, we've already sacrificed a bit in code readability for the sake of optimization, so we need to be careful about what we change.

Also, optimizations ideally should center around core PHP. E.g., a significant subset of PHP implementations may not be using opcode caches.

The only way of fixing that is smooshing all the includes into one file, or having all the includes be done from a single file to prevent duplicates.

Sorry, you do not have permission to post/reply in this forum.