Welcome! » Log In » Create A New Profile

<object> element wrongly filtered out

Posted by patnaik 
<object> element wrongly filtered out
August 06, 2007 04:50PM

HTMLPurifier 2.1.1 mis-filters the following XHTML 1.0 Strict-valid HTML; only the <a> element passes through.

<object type="video/x-ms-wmv" data="http://domain.com/video.wmv" width="320" height="256"> <param name="src" value="http://domain.com/video.wmv" /> <param name="autostart" value="false" /> <param name="controller" value="true" /> <param name="pluginurl" value="http://www.microsoft.com/Windows/MediaPlayer/" /> <a href="http://www.microsoft.com/Windows/MediaPlayer/">Windows Media player required</a> </object>

Even without the <param> and/or <a> elements, the element is filtered out.

Re: <object> element wrongly filtered out
August 07, 2007 01:10AM

<object> is currently not supported in trusted mode, sorry. And there's no way in hell I'm allowing <object> in untrusted mode. ;-) You'll have to make a filter for it.

Re: <object> element wrongly filtered out
August 07, 2007 05:17AM

I wish HTMLPurifier supported and other neglected elements. Whether and when can be trusted should be left to the HTMLPurifier end-user.

Re: <object> element wrongly filtered out
August 07, 2007 12:19PM

I tried coding an 'objects' module (code below), but it did not work. What is wrong?

I would like to add that the HTMLPurifier documentation does not provide clear information on how one should write a custom module - where to put the files, naming conventions, etc. Also, notes on implementing a custom module should be provided. E.g., would one have to edit HTMLModuleManager.php?

Further, documentation of terms like 'NMTOKENS', 'ID', 'Enum#...', etc., and functions like 'addElement(...' and their arguments -- items that customisers are likely to need to use -- should be provided or be better.

New file in objects.php in HTMLModule/

<?phprequire_once 'HTMLPurifier/HTMLModule.php';/** * XHTML 1.0 Objects Module. */class HTMLPurifier_HTMLModule_Objects extends HTMLPurifier_HTMLModule{    var $name = 'Objects';    function HTMLPurifier_HTMLModule_Objects() {        $this->addElement('object', true, 'Block', 'Flow', 'Common',             array(                'archive' => 'URI',                'classid' => 'URI',                'codebase' => 'URI',                'codetype' => 'Text',                'data' => 'URI',                'declare' => 'Enum#declare',                'height' => 'Length',                'name' => 'NMTOKENS',                'standby' => 'Text',                'tabindex' => 'Number',                'type' => 'Text',                'usemap' => 'URI',                'width' => 'Length'            )        );        $this->addElement('param', true, false, 'Empty', false,            array(                'id' => 'ID',                'name*' => 'Text',                'type' => 'Text',                'value' => 'Text',                'valuetype' => 'Enum#data,ref,object'                  )               );    }}

Edited HTMLModuleManager.php

Added a require_once() for the new objects.php, and edited function HTMLPurifier_HTMLModuleManager() to add 'Objects' to the '$common' array.

Re: <object> element wrongly filtered out
August 07, 2007 01:00PM

You'll have to use the API documentation (which, by the way, is quite good) in order to move into this area.

I would like to add that the HTMLPurifier documentation does not provide clear information on how one should write a custom module - where to put the files, naming conventions, etc.

If you're attempting to create a patch for HTML Purifier, just follow the previous examples in the HTMLModule/ directory. If this is just for personal use, try to keep it outside of the directory.

Also, notes on implementing a custom module should be provided. E.g., would one have to edit HTMLModuleManager.php?

Use this to load the module without editing HTMLModuleManager:

$def =& $this->getHTMLDefinition(true);
$def->manager->addModule(new HTMLPurifier_HTMLModule_object())
Further, documentation of terms like 'NMTOKENS', 'ID', 'Enum#...', etc., and functions like 'addElement(...' and their arguments -- items that customisers are likely to need to use -- should be provided or be better.

The relevant API documentation is in AttrDef/ and HTMLModule.php respectively. To find out what attribute string corresponds to what object, consult AttrTypes.php. The ending parameters are processed by the function make() in AttrDef classes.

I don't have time to debug your module myself right now (I will take a whack at it later), but I noticed a few things:

  • For safety reasons, the second parameter of the addElements() calls should be false. This parameter indicates whether or not the tag is safe to allow untrusted users to use.
  • Setting the content model for object as Flow will preclude param from being used. You'll need to use 'Optional: Flow | param' (if my memory serves me correctly)
  • Make sure you've followed the development instructions in Customize, they are still relevant to module management. Namely, your changes won't show up until you've cleared the cache.

Oh, by the way, auto-paragraphing is on, so you don't need to add <br> tags willy nilly to your forum posts. :-)

Re: <object> element wrongly filtered out
August 07, 2007 02:29PM

Thanks for the response.

Re: <object> element wrongly filtered out
August 07, 2007 11:33PM

Did it work?

Re: <object> element wrongly filtered out
August 08, 2007 01:12AM

This patch should do the trick. It will be included in the next version:

Index: library/HTMLPurifier/HTMLModuleManager.php
===================================================================
--- library/HTMLPurifier/HTMLModuleManager.php	(revision 1372)
+++ library/HTMLPurifier/HTMLModuleManager.php	(working copy)
@@ -29,6 +29,7 @@
 require_once &#039;HTMLPurifier/HTMLModule/XMLCommonAttributes.php&#039;;
 require_once &#039;HTMLPurifier/HTMLModule/NonXMLCommonAttributes.php&#039;;
 require_once &#039;HTMLPurifier/HTMLModule/Ruby.php&#039;;
+require_once &#039;HTMLPurifier/HTMLModule/Object.php&#039;;
 
 // tidy modules
 require_once &#039;HTMLPurifier/HTMLModule/Tidy.php&#039;;
@@ -172,7 +173,7 @@
         $common = array(
             &#039;CommonAttributes&#039;, &#039;Text&#039;, &#039;Hypertext&#039;, &#039;List&#039;,
             &#039;Presentation&#039;, &#039;Edit&#039;, &#039;Bdo&#039;, &#039;Tables&#039;, &#039;Image&#039;,
-            &#039;StyleAttribute&#039;, &#039;Scripting&#039;
+            &#039;StyleAttribute&#039;, &#039;Scripting&#039;, &#039;Object&#039;
         );
         $transitional = array(&#039;Legacy&#039;, &#039;Target&#039;);
         $xml = array(&#039;XMLCommonAttributes&#039;);
Index: library/HTMLPurifier/HTMLModule/Object.php
===================================================================
--- library/HTMLPurifier/HTMLModule/Object.php	(revision 0)
+++ library/HTMLPurifier/HTMLModule/Object.php	(revision 0)
@@ -0,0 +1,47 @@
+<?php
+
+require_once &#039;HTMLPurifier/HTMLModule.php&#039;;
+
+/**
+ * XHTML 1.1 Object Module, defines elements for generic object inclusion
+ * @warning Users will commonly use <embed> to cater to legacy browsers: this
+ *      module does not allow this sort of behavior
+ */
+class HTMLPurifier_HTMLModule_Object extends HTMLPurifier_HTMLModule
+{
+    
+    var $name = &#039;Object&#039;;
+    
+    function HTMLPurifier_HTMLModule_Object() {
+        
+        $this->addElement(&#039;object&#039;, false, &#039;Inline&#039;, &#039;Optional: #PCDATA | Flow | param&#039;, &#039;Common&#039;, 
+            array(
+                &#039;archive&#039; => &#039;URI&#039;,
+                &#039;classid&#039; => &#039;URI&#039;,
+                &#039;codebase&#039; => &#039;URI&#039;,
+                &#039;codetype&#039; => &#039;Text&#039;,
+                &#039;data&#039; => &#039;URI&#039;,
+                &#039;declare&#039; => &#039;Bool#declare&#039;,
+                &#039;height&#039; => &#039;Length&#039;,
+                &#039;name&#039; => &#039;CDATA&#039;,
+                &#039;standby&#039; => &#039;Text&#039;,
+                &#039;tabindex&#039; => &#039;Number&#039;,
+                &#039;type&#039; => &#039;ContentType&#039;,
+                &#039;width&#039; => &#039;Length&#039;
+            )
+        );
+
+        $this->addElement(&#039;param&#039;, false, false, &#039;Empty&#039;, false,
+            array(
+                &#039;id&#039; => &#039;ID&#039;,
+                &#039;name*&#039; => &#039;Text&#039;,
+                &#039;type&#039; => &#039;Text&#039;,
+                &#039;value&#039; => &#039;Text&#039;,
+                &#039;valuetype&#039; => &#039;Enum#data,ref,object&#039;
+           )
+        );
+    
+    }
+    
+}
+
Index: library/HTMLPurifier/AttrTypes.php
===================================================================
--- library/HTMLPurifier/AttrTypes.php	(revision 1372)
+++ library/HTMLPurifier/AttrTypes.php	(working copy)
@@ -44,6 +44,9 @@
         $this->info[&#039;LanguageCode&#039;] = new HTMLPurifier_AttrDef_Lang();
         $this->info[&#039;Color&#039;]    = new HTMLPurifier_AttrDef_HTML_Color();
         
+        // unimplemented aliases
+        $this->info[&#039;ContentType&#039;] = new HTMLPurifier_AttrDef_Text();
+        
         // number is really a positive integer (one or more digits)
         // FIXME: ^^ not always, see start and value of list items
         $this->info[&#039;Number&#039;]   = new HTMLPurifier_AttrDef_Integer(false, false, true);

One of my big concerns with this patch is the fact that it doesn't treat embed tags very nicely, which makes it effectively impossible to generate HTML that is directly cross-browser compatible.

Re: <object> element wrongly filtered out
August 08, 2007 04:37AM

Excellent! Thank you very much.

Sorry, you do not have permission to post/reply in this forum.