Matthijs Kooijman
Uninitialized string offset: 0 on specific input
March 17, 2011 10:55AM

Hi folks,

while using HTMLPurifier (4.2.0), I got some E_NOTICES, which seem to be caused by having a <font size=""2""> tag in my input (i.e., empty size attribute). I don't know who created this silly HTML, but I have a database full of junk (and other that this, HTMLPurifier is holding its ground!). I haven't tried minimizing the test case yet, but a quick glance at the code suggests that the length of the attribute is not checked before accessing the first character.

Anyway, here's a full backtrace.

Notice: Uninitialized string offset:  0 (Non-fatal)

Error occured at /libs/htmlpurifier-standalone/HTMLPurifier.standalone.php:15370

BackTrace:
#0    HTMLPurifier_TagTransform_Font::transform(object(HTMLPurifier_Token_Start), object(HTMLPurifier_Config), object(HTMLPurifier_Context))
                                                              called by /libs/htmlpurifier-standalone/HTMLPurifier.standalone.php:15161
#1    HTMLPurifier_Strategy_RemoveForeignElements::execute(array(0 => object(HTMLPurifier_Token_Text), 1 => object(HTMLPurifier_Token_Start), 2 => object(HTMLPurifier_Token_Text), 3 => object(HTMLPurifier_Token_Start), 4 => object(HTMLPurifier_Token_Text), 5 => object(HTMLPurifier_Token_Empty), 6 => object(HTMLPurifier_Token_Text), 7 => object(HTMLPurifier_Token_Empty), 8 => object(HTMLPurifier_Token_Empty), 9 => object(HTMLPurifier_Token_Text), 10 => object(HTMLPurifier_Token_Empty), 11 => object(HTMLPurifier_Token_Empty), 12 => object(HTMLPurifier_Token_Text), 13 => object(HTMLPurifier_Token_Empty), 14 => object(HTMLPurifier_Token_Start), 15 => object(HTMLPurifier_Token_Text), 16 => object(HTMLPurifier_Token_Empty), 17 => object(HTMLPurifier_Token_End), 18 => object(HTMLPurifier_Token_Empty), 19 => object(HTMLPurifier_Token_End), 20 => object(HTMLPurifier_Token_End), 21 => object(HTMLPurifier_Token_Start), 22 => object(HTMLPurifier_Token_Start), 23 => object(HTMLPurifier_T!
 oken_Empty), 24 => object(HTMLPurifier_Token_End), 25 => object(HTMLPurifier_Token_End)), object(HTMLPurifier_Config), object(HTMLPurifier_Context))
                                                              called by /libs/htmlpurifier-standalone/HTMLPurifier.standalone.php:14260
#2    HTMLPurifier_Strategy_Composite::execute(array(0 => object(HTMLPurifier_Token_Text), 1 => object(HTMLPurifier_Token_Start), 2 => object(HTMLPurifier_Token_Text), 3 => object(HTMLPurifier_Token_Start), 4 => object(HTMLPurifier_Token_Text), 5 => object(HTMLPurifier_Token_Empty), 6 => object(HTMLPurifier_Token_Text), 7 => object(HTMLPurifier_Token_Empty), 8 => object(HTMLPurifier_Token_Empty), 9 => object(HTMLPurifier_Token_Text), 10 => object(HTMLPurifier_Token_Empty), 11 => object(HTMLPurifier_Token_Empty), 12 => object(HTMLPurifier_Token_Text), 13 => object(HTMLPurifier_Token_Empty), 14 => object(HTMLPurifier_Token_Start), 15 => object(HTMLPurifier_Token_Text), 16 => object(HTMLPurifier_Token_Empty), 17 => object(HTMLPurifier_Token_End), 18 => object(HTMLPurifier_Token_Empty), 19 => object(HTMLPurifier_Token_End), 20 => object(HTMLPurifier_Token_End), 21 => object(HTMLPurifier_Token_Start), 22 => object(HTMLPurifier_Token_Start), 23 => object(HTMLPurifier_Token_Empty),!
  24 => object(HTMLPurifier_Token_End), 25 => object(HTMLPurifier_Token_End)), object(HTMLPurifier_Config), object(HTMLPurifier_Context))
                                                              called by /libs/htmlpurifier-standalone/HTMLPurifier.standalone.php:201
#3    HTMLPurifier::purify(" <span lang=""N""><p>Het Rabo EK Hockey wordt dit jaar gehouden in het Wagener Stadion in Amsterdam. Zowel de dames als de heren strijden om de Europese titel. Het is de eerste keer dat een dubbel EK in Nederland gehouden wordt. <br>De heren verdedigen hun titel van twee jaar geleden. De dames gaan weer voor goud na hun tweede plek in 2007. Beide teams kunnen zich op het EK kwalificeren voor het WK van 2010. <br><br>Tickets zijn te verkrijgen via www.topticketline.nl (0900 300 1000, 40cpm). <br><br>Let op; Leden van de Rabobank krijgen 25% korting. Meer info hierover is te vinden op<br><a class="new-window" href="http://www.rabobank.nl/particulieren/servicemenu/leden/sport_en_ontspanning_aanbiedingen/kaarten_ek_hockey/default">http://www.rabobank.nl/particulieren/servicemenu/leden/sport_en_ontspanning_aanbiedingen/kaarten_ek_hockey/default<br></a><br></p></span><font face=""Arial"" size=""2""><font face=""Arial"" size=""2""><span lang=""EN""></spa!
 n></font></font>")
                                                              called by /tools/import_news.php:100

Is this the right place to report a bug like this?

Re: Uninitialized string offset: 0 on specific input
March 17, 2011 01:28PM

Looks like a bug (a harmless one, but a bug nonetheless). Thanks for reporting!

Matthijs Kooijman
Re: Uninitialized string offset: 0 on specific input
March 21, 2011 01:56PM

Awesome, thanks!

Author:
Your Email:

Subject:

HTML input is enabled. Make sure you escape all HTML and angled brackets with &lt; and &gt;.

Auto-paragraphing is enabled. Double newlines will be converted to paragraphs; for single newlines, use the pre tag.

Allowed tags: a, abbr, acronym, b, blockquote, caption, cite, code, dd, del, dfn, div, dl, dt, em, i, ins, kbd, li, ol, p, pre, s, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, var.

For inputting literal code such as HTML and PHP for display, use CDATA tags to auto-escape your angled brackets, and pre to preserve newlines:

<pre><![CDATA[
Place code here
]]></pre>

Power users, you can hide this notice with:

.htmlpurifier-help {display:none;}

Message: