|
Stefan W
Safe XSS Settings for Extremely Large Documents?November 14, 2011 03:10PM |
I've been having some trouble with HTML Purifier when attempting to process large documents (~300k of HTML)
Essentially I get an XML file containing a large amount of text. I will parse this into a SimpleXML object then push the relevant values though HTML Purifier into a different object to be saved. This works fine with small documents but large documents will take about 10mins each to parse which is simply too long in my case. I've pushed some documents through via apache with a 60 second timeout and it generally fails in Strategy/MakeWellFormed so I guess it's doing some tag balancing or something that is just a time consuming process. I'm only really in this for the XSS filtering so I'm really just trying to find out what the fastest safe XSS configuration is.
My current config is:
$config = HTMLPurifier_Config::createDefault();
$config->set('Core.Encoding', 'UTF-8');
$config->set('HTML.Doctype', 'XHTML 1.0 Strict');
$config->set('HTML.SafeObject', true);
$config->set('Output.FlashCompat', true);
$config->set('HTML.Nofollow', true);
$config->set('HTML.TidyLevel', 'none');
(I don't know if I can attach things on here but I'd be happy to attach a sample that will fail as described.)
Thank you.
|
Stefan W
Re: Safe XSS Settings for Extremely Large Documents?November 14, 2011 03:23PM |
I can't edit my first post because I didn't create an account but a bit more info:
If I just comment out these two lines in Strategy/Core.php it will pass all my tests (which only check for script tag removal) and complete in 10 seconds rather than minutes.
$this->strategies[] = new HTMLPurifier_Strategy_MakeWellFormed(); $this->strategies[] = new HTMLPurifier_Strategy_FixNesting();
To what extent is this a really bad idea (i.e. do the removed classes do critical work with respect to XSS filtering)?
|
Re: Safe XSS Settings for Extremely Large Documents? November 15, 2011 11:58AM |
Admin Registered: 6 years ago Posts: 2,640 |
|
Dmitry Kozlovich
Re: Safe XSS Settings for Extremely Large Documents?March 16, 2012 01:49PM |
I have the same problem (use HTMLPurifier 4.4.0, latest). File contains terrible ms-word content with tags o:p and others, file size is 196 kb. PHP max_execution_time = 600 sec. HTMLPurifier fails, because not enough time. I tested also usual html file, but with content size ~ 300 kb. Process in this case consumed near 20-30 sec. I delete sense content from ms-word styled file except tags, but purifier fails also.
File content fragment (in body section):
<p> <o:p></o:p></p><p> <o:p></o:p></p><p> <o:p></o:p></p><p> <o:p></o:p></p><p> <o:p></o:p></p><p> <o:p></o:p></p><p><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'></span><o:p></o:p></p><p> <o:p></o:p></p><p> <o:p></o:p></p><p> <o:p></o:p></p><p> <o:p></o:p></p><p> <o:p></o:p></p><p> <o:p></o:p></p><p> <o:p></o:p></p><p> <o:p></o:p></p><p> <o:p></o:p></p><p> <o:p></o:p></p><p> <o:p></o:p></p><p> <o:p></o:p></p><p> <o:p></o:p></p><p> <o:p></o:p></p><p> <o:p></o:p></p>
|
Re: Safe XSS Settings for Extremely Large Documents? March 20, 2012 11:19PM |
Admin Registered: 6 years ago Posts: 2,640 |