Welcome! » Log In » Create A New Profile

Extending few classes

Posted by Tomasz Muras 
Tomasz Muras
Extending few classes
August 18, 2010 04:48PM

Hello everyone,

Moodle is using a modified version of HTML Purifier - you can see a diff between Moodle and vanilla version below. This is a bit problematic for me as that forces me to use (in Debian, I'm a Maintainer) the version bundled with Moodle - and I'd prefer to use your version, that is already packaged for Debian. The adjustments that Moodle did are necessary, to re-implement them I would need to extend few classes: * HTMLPurifier_AttrDef_Lang * HTMLPurifier_HTMLModule_Text * HTMLPurifier_HTMLModule_XMLCommonAttributes

Could you suggest some way of having HTML Purifier to use the extended classes (how could I inject them). Of course I would like to do it without modifying/patching the HTML Purifier code.

cheers, Tomek

diff -ru vanilla/HTMLPurifier/AttrDef/Lang.php moodle/HTMLPurifier/AttrDef/Lang.php
--- vanilla/HTMLPurifier/AttrDef/Lang.php	2010-06-01 04:22:39.000000000 +0100
+++ moodle/HTMLPurifier/AttrDef/Lang.php	2010-05-22 01:04:23.000000000 +0100
@@ -9,6 +9,10 @@
 
     public function validate($string, $config, $context) {
 
+// moodle change - we use special lang strings unfortunatelly
+        return preg_replace('/[^0-9a-zA-Z_-]/', '', $string);
+// moodle change end
+
         $string = trim($string);
         if (!$string) return false;
 
diff -ru vanilla/HTMLPurifier/HTMLModule/Text.php moodle/HTMLPurifier/HTMLModule/Text.php
--- vanilla/HTMLPurifier/HTMLModule/Text.php	2010-06-01 04:22:39.000000000 +0100
+++ moodle/HTMLPurifier/HTMLModule/Text.php	2010-05-22 01:04:23.000000000 +0100
@@ -45,6 +45,13 @@
         $this->addElement('span', 'Inline', 'Inline', 'Common');
         $this->addElement('br',   'Inline', 'Empty',  'Core');
 
+        // Moodle specific elements - start
+        $this->addElement('nolink',  'Inline', 'Flow');
+        $this->addElement('tex',     'Inline', 'Flow');
+        $this->addElement('algebra', 'Inline', 'Flow');
+        $this->addElement('lang',    'Inline', 'Flow', 'I18N');
+        // Moodle specific elements - end
+        
         // Block Phrasal --------------------------------------------------
         $this->addElement('address',     'Block', 'Inline', 'Common');
         $this->addElement('blockquote',  'Block', 'Optional: Heading | Block | List', 'Common', array('cite' => 'URI') );
diff -ru vanilla/HTMLPurifier/HTMLModule/XMLCommonAttributes.php moodle/HTMLPurifier/HTMLModule/XMLCommonAttributes.php
--- vanilla/HTMLPurifier/HTMLModule/XMLCommonAttributes.php	2010-06-01 04:22:39.000000000 +0100
+++ moodle/HTMLPurifier/HTMLModule/XMLCommonAttributes.php	2010-05-22 01:04:23.000000000 +0100
@@ -5,9 +5,11 @@
     public $name = 'XMLCommonAttributes';
 
     public $attr_collections = array(
+/* moodle comment - xml:lang breaks our multilang
         'Lang' => array(
             'xml:lang' => 'LanguageCode',
         )
+*/
     );
 }
 
diff -ru vanilla/HTMLPurifier/Lexer.php moodle/HTMLPurifier/Lexer.php
--- vanilla/HTMLPurifier/Lexer.php	2010-06-01 04:22:39.000000000 +0100
+++ moodle/HTMLPurifier/Lexer.php	2010-07-06 01:04:03.000000000 +0100
@@ -252,8 +252,10 @@
     public function normalize($html, $config, $context) {
 
         // normalize newlines to \n
-        $html = str_replace("\r\n", "\n", $html);
-        $html = str_replace("\r", "\n", $html);
+        if ($config->get('Output.Newline')!=="\n") {
+            $html = str_replace("\r\n", "\n", $html);
+            $html = str_replace("\r", "\n", $html);
+        }
 
         if ($config->get('HTML.Trusted')) {
             // escape convoluted CDATA
Re: Extending few classes
August 18, 2010 10:43PM

A few of those patches can be adjusted for via purely configuration changes ala http://htmlpurifier.org/docs/enduser-customize.html , but some of those are trickier.

HTMLPurifier/AttrDef/Lang.php, it should be theoretically possible to take the raw configuration, find the Lang module, and manually put in your new implementation of Lang.

HTMLPurifier/HTMLModule/Text.php is totally well supported by the customization interface.

HTMLPurifier/HTMLModule/XMLCommonAttributes.php you can just disallow xml:lang at the configuration level

HTMLPurifier/Lexer.php this is confusing. Why do you need to preserve carriage returns, if you've told HTML Purifier that the input is UNIX line-endings style?

Tomasz Muras
Re: Extending few classes
August 19, 2010 04:23PM

The code in Lexer.php was put there to disable line endings normalization altogether, see http://tracker.moodle.org/browse/MDL-22654 . Do you think it would make sense to make line endings normalization configurable? It would definitely be helpful in this case.

cheers, Tomek

Re: Extending few classes
August 24, 2010 12:49AM

I think there are some cases in which line normalization is crucial to the well-behavedness of an algorithm (autoparagraphing comes to mind). However, if Moodle doesn't care about such cases, it probably wouldn't be a security risk.

Tomasz Muras
Re: Extending few classes
August 24, 2010 04:09AM

So what do you think about extending HTML Purifier to make newline normalization configurable? I could prepare a patch.

Tomek

Re: Extending few classes
August 24, 2010 01:12PM

Sure. See http://htmlpurifier.org/docs/dev-config-schema.html for how to add a configuration directive; please check up on some existing directives to get a feel for style.

Tomasz Muras
Re: Extending few classes
September 09, 2010 04:43PM

Hello!

The patch to add new configuration option and disable newline normalization (in library/HTMLPurifier/Lexer.php) is trivial but I've run into problems when adding a test:

        $this->config->set('HTML.NewlineNormalization', false);
        $input = "plain text\r\n";
        $expect = array(
                new HTMLPurifier_Token_Text("plain text\r\n")
        );
        $this->assertTokenization($input, $expect);

This works OK for DirectLex and DOMLex but fails for PH5P. This is because HTML5 class is stripping the lines in a constructor:

    $data = str_replace("\r\n", "\n", $data);
    $data = str_replace("\r", null, $data);

I think these lines should be removed from HTML5 class, since this is done in Lexer::normalize anyway, would you agree?

cheers, Tomek Muras

Re: Extending few classes
September 11, 2010 02:56AM

Yeah, I think that should be ok.

Author:
Your Email:

Subject:

HTML input is enabled. Make sure you escape all HTML and angled brackets with < and >.

Auto-paragraphing is enabled. Double newlines will be converted to paragraphs; for single newlines, use the pre tag.

Allowed tags: a, abbr, acronym, b, blockquote, caption, cite, code, dd, del, dfn, div, dl, dt, em, i, ins, kbd, li, ol, p, pre, s, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, var.

For inputting literal code such as HTML and PHP for display, use CDATA tags to auto-escape your angled brackets, and pre to preserve newlines:

<pre><![CDATA[
Place code here
]]></pre>

Power users, you can hide this notice with:

.htmlpurifier-help {display:none;}

Message: