|
Code improvement for speed July 08, 2007 03:56PM |
Registered: 6 years ago Posts: 43 |
Though I have not looked into the strategy HTML Purifier uses, a quick look at some of its code suggests that there are many points where the PHP can be optimized. Done together, such optimizations may yield significant improvements. Here are some of them:
1. HTMLPurifier.php has atleast 9 require_once() calls. Can require_once() be changed to require()? Can some of the 9 files be merged?
2. Loops like 'for ($i = 0, $size = count($this->filters); $i < $size; $i++)' can be speeded-up by moving the count() outside.
3. Instead of concatenating strings, can one do the much faster ob_start and ob_get_contents?
4. Instead of calls to small-sized functions, can the code be used directly? E.g.,
'$html = preg_replace('!<body[^>]*>(.+?)</body>!is', '$1', $html);'
can replace:
function extractBody($html) {
$matches = array();
$result = preg_match('!<body[^>]*>(.+?)</body>!is', $html, $matches);
if ($result) {
return $matches[1];
} else {
return $html;
}
}
5. Can regular expressions be avoided? E.g., even instead of the better preg_replace() stated above, using this apparently complex code speeded the evaluation atleast a 100-fold :
$result = (($c = strpos($d=substr($result, ($a = strpos($result, '<body')) + 5, ($b = strpos($result, '</body>'))-$a-5), '>')) and $a !== false and $b !== false) ? substr($d, $c+1): $result;
|
Re: Code improvement for speed July 09, 2007 08:31AM |
Admin Registered: 6 years ago Posts: 2,640 |
These are valiant suggestions, but a lot of them are not applicable.
1. Opcode caching should manage this, although I should be looking into smooshing all the files into one giant include
2. Look carefully: the count() call is in the initialization statement, not the repeated statement
3. Mostly inapplicable, since the HTML representation isn't converted into actual text until the very end.
4. The regexp is only used once and thus is very "cheap". More dangerous regexps are the ones that are called repeatedly, such as the one in URI.
5. Once again, in that case, the regexp call is cheap compared to the rest of the code, and is only called when a tag is detected.
|
Re: Code improvement for speed July 09, 2007 09:42AM |
Registered: 6 years ago Posts: 43 |
Yes, that was a wrong example to give re: loop optimization.
The point I am making, and which you anyway must have kept in mind, is that all possible optimizations should be looked into.
Also, optimizations ideally should center around core PHP. E.g., a significant subset of PHP implementations may not be using opcode caches.
|
Re: Code improvement for speed July 09, 2007 04:55PM |
Admin Registered: 6 years ago Posts: 2,640 |
Yes, that was a wrong example to give re: loop optimization.
While on the subject, however, I am curious to find out how costly it is to be continually calling isset in a loop.
The point I am making, and which you anyway must have kept in mind, is that all possible optimizations should be looked into.
Yes, but the big problems should be dealt with first. I haven't profiled HTML Purifier in a while, and it's high time I do so again. Also, we've already sacrificed a bit in code readability for the sake of optimization, so we need to be careful about what we change.
Also, optimizations ideally should center around core PHP. E.g., a significant subset of PHP implementations may not be using opcode caches.
The only way of fixing that is smooshing all the includes into one file, or having all the includes be done from a single file to prevent duplicates.