Welcome! » Log In » Create A New Profile

[FixedBug] Fatal error in benchmark tests

Posted by xorax 
[FixedBug] Fatal error in benchmark tests
February 12, 2007 11:43PM

hi,

sorry if my english is bad but I'm french :p

for my first post, congratulation for this package! many PHP dev search this and I am really happy that exists :D

I would like to try to do a benchmark, I browse the code in the benchmark directory but I have many error when I run lexer.php...

Warning: Missing argument 2 for HTMLPurifier_Lexer_DirectLex::tokenizeHTML(), called in .../htmlpurifier-1.4.1\benchmarks\Lexer.php on line 100 and defined in .../htmlpurifier-1.4.1\library\HTMLPurifier\Lexer\DirectLex.php on line 27

Warning: Missing argument 3 for HTMLPurifier_Lexer_DirectLex::tokenizeHTML(), called in .../htmlpurifier-1.4.1\benchmarks\Lexer.php on line 100 and defined in .../htmlpurifier-1.4.1\library\HTMLPurifier\Lexer\DirectLex.php on line 27

Notice: Undefined variable: config in .../htmlpurifier-1.4.1\library\HTMLPurifier\Lexer\DirectLex.php on line 29

Fatal error: Call to a member function get() on a non-object in .../htmlpurifier-1.4.1\library\HTMLPurifier\Lexer.php on line 203

I test to resolve that : I include $config = HTMLPurifier_Config::create($config); in the multiples functions called but after, more other errors are appeared ... ?

thanks!

Edited 1 time(s). Last edit at 04/02/2007 06:30AM by Ambush Commander.

Re: benchmark test ?
February 13, 2007 03:34PM

Oops, it looks like I haven't been keeping that code up to date. I'll fix it in the next release.

It's not a very interesting benchmark, really, it just compares the HTML parsing capabilities between DOM, PEAR's XML_HTMLSax3 and HTML Purifier's DirectLex. Last I checked, DOM was the fastest, followed by DirectLex and then HTMLSax3.

HTML Purifier, Standards Compliant HTML Filtering

Re: benchmark test ?
February 13, 2007 03:52PM

This patch should fix it:

Index: Lexer.php =================================================================== --- Lexer.php (revision 710) +++ Lexer.php (working copy) @@ -7,6 +7,7 @@ require_once 'HTMLPurifier/ConfigSchema.php'; require_once 'HTMLPurifier/Config.php'; +require_once 'HTMLPurifier/Context.php'; $LEXERS = array(); $RUNS = isset($GLOBALS['HTMLPurifierTest']['Runs']) @@ -93,11 +94,14 @@ function do_benchmark($name, $document) { global $LEXERS, $RUNS; + $config = HTMLPurifier_Config::createDefault(); + $context = new HTMLPurifier_Context(); + $timer = new RowTimer($name); $timer->start(); foreach($LEXERS as $key => $lexer) { - for ($i=0; $i<$RUNS; $i++) $tokens = $lexer->tokenizeHTML($document); + for ($i=0; $i<$RUNS; $i++) $tokens = $lexer->tokenizeHTML($document, $config, $context); $timer->setMarker($key); } Index: ProfileDirectLex.php =================================================================== --- ProfileDirectLex.php (revision 710) +++ ProfileDirectLex.php (working copy) @@ -5,12 +5,15 @@ require_once 'HTMLPurifier/ConfigSchema.php'; require_once 'HTMLPurifier/Config.php'; require_once 'HTMLPurifier/Lexer/DirectLex.php'; +require_once 'HTMLPurifier/Context.php'; $input = file_get_contents('samples/Lexer/4.html'); $lexer = new HTMLPurifier_Lexer_DirectLex(); +$config = HTMLPurifier_Config::createDefault(); +$context = new HTMLPurifier_Context(); for ($i = 0; $i < 10; $i++) { - $tokens = $lexer->tokenizeHTML($input); + $tokens = $lexer->tokenizeHTML($input, $config, $context); } ?> \ No newline at end of file

HTML Purifier, Standards Compliant HTML Filtering

Re: benchmark test ?
February 13, 2007 04:20PM

Oh ok I see.

if you have the time, a real benchmark on the last release who use the HTML Purifier's DirectLex and on a document with incorrect HTML would be very appreciated. maybe I will try to develop that :) I report my code here if it is.

thanks!

Author:
Your Email:

Subject:

HTML input is enabled. Make sure you escape all HTML and angled brackets with &lt; and &gt;.

Auto-paragraphing is enabled. Double newlines will be converted to paragraphs; for single newlines, use the pre tag.

Allowed tags: a, abbr, acronym, b, blockquote, caption, cite, code, dd, del, dfn, div, dl, dt, em, i, ins, kbd, li, ol, p, pre, s, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, var.

For inputting literal code such as HTML and PHP for display, use CDATA tags to auto-escape your angled brackets, and pre to preserve newlines:

<pre><![CDATA[
Place code here
]]></pre>

Power users, you can hide this notice with:

.htmlpurifier-help {display:none;}

Message: