|
Allowing htmk and head tags December 02, 2011 05:47AM |
Registered: 2 years ago Posts: 4 |
Hi everyone. I have to filtrer mainly «script» tags, but to accept «html», «head» and «body»
I tried to customize, but those tags are still filtered.
<pre><![CDATA[ $ParamsFiltres = HTMLPurifier_Config::createDefault();
$tags_ok= 'html,head,title,link,body,style,font,'. 'span,'. 'br,h1,h2,h3,h4,h5,h6,div,p,blockquote,address,hr,ul,ol,li,'. 'table,caption,col,colgroup,thead,tbody,tfoot,tr,th,td,fieldset,legend,code,pre,tt,dl,dt,dd,'. //'article,aside,'. 'strong,em,u,del,img,cite,abbr,acronym,big,small'; $tags_no='meta,script,frameset,frames,noframe,sdfield,a,'. 'object,embed,param,iframe,form,input,select,optgroup,option,textarea,button,'; $ParamsFiltres->set('Core.Encoding' ,'utf-8'); $ParamsFiltres->set('Core.ConvertDocumentToFragment' ,false); // I say, I need a entire HTML $ParamsFiltres->set('Core.HiddenElements' ,array( 'script' => true, )); $ParamsFiltres->set('HTML.Trusted' ,true); $ParamsFiltres->set('HTML.Allowed' ,'html,head,body'); $ParamsFiltres->set('HTML.Parent' ,'html'); $ParamsFiltres->set('HTML.AllowedElements' ,$tags_ok); $ParamsFiltres->set('HTML.ForbiddenElements' ,$tags_no); $ParamsFiltres->set('Filter.ExtractStyleBlocks.Escaping' ,true); $ParamsFiltres->set('HTML.DefinitionID', 'backfromfrontrenderer.html renderer'); $ParamsFiltres->set('HTML.DefinitionRev', 1); $def = $ParamsFiltres->getHTMLDefinition(true); { $def->addElement('html', 'Block', 'Flow', 'Common'); $def->addElement('head', 'Block', 'Flow', 'Common' ); $def->addElement('title', 'Inline', 'Empty', 'Common' ); $def->addElement('style', 'Block', 'Flow', 'Common' ); $def->addElement('link', 'Block', 'Empty', 'Common' ); $def->addElement('body', 'Block', 'Flow', 'Common' ); } $purifier = new HTMLPurifier($ParamsFiltres); ]]></pre>
|
Re: Allowing htmk and head tags December 02, 2011 11:42AM |
Admin Registered: 6 years ago Posts: 2,630 |
Try %Core.LexerImpl set to DirectLex.
|
Re: Allowing htmk and head tags December 16, 2011 03:14AM |
Registered: 2 years ago Posts: 4 |
Sorry for very late response It works, but I can see the doctype escaped <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/2002/REC-xhtml1-20020801/DTD/xhtml1-transitional.dtd">
|
Re: Allowing htmk and head tags February 17, 2012 05:35AM |
Registered: 3 years ago Posts: 61 |
Two things:
Sorry for very late response It works, but I can see the doctype escaped <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/2002/REC-xhtml1-20020801/DTD/xhtml1-transitional.dtd">
You need to deal with that manually. As you've noticed, HTML Purifier isn't set up to deal with entire HTML documents, it expects HTML fragments (HTML in the body); getting full header support is currently effectively impossible. If you want to preserve the DOCTYPE-declaration, you'll need to grab the DOCTYPE with a regex, then add it back in when the HTML Purifier is done.
Be very careful with that, since if your regex ends up too greedy you may end up allowing XSS again. I'd recommend analysing what you've grabbed and trying to construct a safe DOCTYPE out of information found, never actually reusing the input data - for example by mapping strpos() !== false occurrences to fixed strings, e.g.
// [...]
$cleanDoctype = '';
// $dirtyDoctype is what the regex grabbed
if (strpos(strtolower($dirtyDoctype), 'xhtml 1.0') !== false) {
$cleanDoctype = '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"'
. ' "http://www.w3.org/TR/2002/REC-xhtml1-20020801/DTD/xhtml1-transitional.dtd">';
}
// [...]
echo $cleanDoctype . $cleanHtml;
However, consider what you're doing: You're configuring HTML Purifier to use its default "Doctype" (without checking the Purifier source, I assume this is HTML 4.01 Transitional). This defines what document structure HTML Purifier will allow. If the default-"Doctype" is for HTML and the DOCTYPE you announce the document to be after purification (the DOCTYPE supplied by the user that you extracted and preserved with the regex) is for XHTML, you can cause browser errors. You might even be opening yourself to an obscure XSS vector that way.
(Edit: Fixed formatting after an HTML escaping bug ravaged the forum.)
Edited 1 time(s). Last edit at 07/30/2012 12:54PM by pinkgothic.