Welcome! » Log In » Create A New Profile

Little extension to Munge (CurrentToken::name)

Posted by Jochem Blok 
Jochem Blok
Little extension to Munge (CurrentToken::name)
June 17, 2008 10:30AM

Hello,

I modified the HTML purifier class to get some more info about the context of the url which is Munged.

File:

/htmlpurifier/library/HTMLPurifier/AttrDef/URI.php

Line: 70 Original

$result = str_replace('%s', rawurlencode($result), $munge);

Modified

$result = str_replace(
  array('%s','%n'), 
  array(rawurlencode($result),rawurlencode($context->get('CurrentToken')->name)
), $munge);

When I purify an html snippt:

<a href="http://htmlpurifier.org">HTML Purifier</a>

You can get the scope of the link, in this case a. With this option the munged url is able to decide what to do. You can redirect to the url, you can show a dummy url etc.

In my case I use it to redirect all links (a) to the requested url. Images etc are replaced by a dummy image.

MungeUrl is extended by an extra parameter: %n which contains the name of the CurrentToken.

An example MungeUrl is: http://www.example.com/?url=%s&name=%n

With the given example the url will be: http://www.example.com/?url=http://htmlpurifier.org&name=a

Good idea?

Re: Little extension to Munge (CurrentToken::name)
June 17, 2008 11:32PM

I've incorporated your suggestion, with numerous improvements, into trunk. Check out the docs here:

http://htmlpurifier.org/dev/configdoc/generate.php#URI.Munge

They will be incorporated into 3.1.1.

Jochem Blok
Re: Little extension to Munge (CurrentToken::name)
June 18, 2008 03:58AM

Seems like you did a great job! I just downloaded the midnight snapshot, but it doesnt seem to contain those changes. I will give svn download a try.

Jochem Blok
Re: Little extension to Munge (CurrentToken::name)
June 18, 2008 05:51AM

Class HTMLPurifier_URIDefinition doens't seem to get loaded by default. So the Munge functions doesn't work. I also get a warning when I set: %Cache.DefinitionImpl. Warning:

Warning: Cannot set undefined directive Cache.DefinitionImpl to value in 
   /var/software/htmlpurifier/library/HTMLPurifier/Config.php on line 196

code:

$oConfig = HTMLPurifier_Config::createDefault();
$oConfig->set('URI', 'Munge', 'http://example.com/?q=%s');
$oConfig->set('Cache', 'DefinitionImpl ', null); // remove this later!
$purifier = new HTMLPurifier($oConfig);
$s = '<img src="http://www.google.nl/intl/nl_nl/images/logo.gif">';
echo $purifier->purify($s);

My guess: Probably the serialized file contains the old HTMLPurifier_URIDefinition instance, so the filters are not loaded.

Re: Little extension to Munge (CurrentToken::name)
June 18, 2008 08:24PM

That's very strange. Do you reinstall over a previous install? As far as I can tell, everything's working on my end. I'm going to do a clean install on a laptop and take a look.

Jochem Blok
Re: Little extension to Munge (CurrentToken::name)
June 19, 2008 03:16AM

I used TortoiseSVN to check out the trunk. I removed all the files and uploaded the trunk library folder to the webserver. At last I chmodded the cache folder.

When I place an echo, or similair, in the constructor of HTMLPurifier_URIDefinition (the place where the filters are initialized), I don't see the output.

Strange, I tried a workaround to assure that the filers are loaded:

class HTMLPurifier_URIDefinition extends HTMLPurifier_Definition
{
 ........
   function __wakeup()
   {
      $this->registerFilter(new HTMLPurifier_URIFilter_DisableExternal());
      $this->registerFilter(new HTMLPurifier_URIFilter_DisableExternalResources());
      $this->registerFilter(new HTMLPurifier_URIFilter_HostBlacklist());
      $this->registerFilter(new HTMLPurifier_URIFilter_MakeAbsolute());
      $this->registerFilter(new HTMLPurifier_URIFilter_Munge());
    }
}

The output of the code is still not munged.

Jochem Blok
Re: Little extension to Munge (CurrentToken::name)
June 19, 2008 03:54AM

A correction on my above post. The Munge filters aren't loaden. When I place a print_r in the __wakeup function the output is:

HTMLPurifier_URIDefinition Object
(
    [type] => URI
    [filters:protected] => Array
        (
            [HostBlacklist] => HTMLPurifier_URIFilter_HostBlacklist Object
                (
                    [name] => HostBlacklist
                    [blacklist:protected] => Array
                        (
                        )

                    [post] => 
                )

        )

    [postFilters:protected] => Array
        (
            [Munge] => HTMLPurifier_URIFilter_Munge Object
                (
                    [name] => Munge
                    [post] => 1
                    [target:private] => http://example.com/?q=%s
                    [parser:private] => HTMLPurifier_URIParser Object
                        (
                            [percentEncoder:protected] => HTMLPurifier_PercentEncoder Object
                                (
                                    [preserve:protected] => Array
                                        (
                                            [48] => 1
                                            [49] => 1
                                            [50] => 1
                                            [51] => 1
                                            [52] => 1
                                            [53] => 1
                                            [54] => 1
                                            [55] => 1
                                            [56] => 1
                                            [57] => 1
                                            [65] => 1
                                            [66] => 1
                                            [67] => 1
                                            [68] => 1
                                            [69] => 1
                                            [70] => 1
                                            [71] => 1
                                            [72] => 1
                                            [73] => 1
                                            [74] => 1
                                            [75] => 1
                                            [76] => 1
                                            [77] => 1
                                            [78] => 1
                                            [79] => 1
                                            [80] => 1
                                            [81] => 1
                                            [82] => 1
                                            [83] => 1
                                            [84] => 1
                                            [85] => 1
                                            [86] => 1
                                            [87] => 1
                                            [88] => 1
                                            [89] => 1
                                            [90] => 1
                                            [97] => 1
                                            [98] => 1
                                            [99] => 1
                                            [100] => 1
                                            [101] => 1
                                            [102] => 1
                                            [103] => 1
                                            [104] => 1
                                            [105] => 1
                                            [106] => 1
                                            [107] => 1
                                            [108] => 1
                                            [109] => 1
                                            [110] => 1
                                            [111] => 1
                                            [112] => 1
                                            [113] => 1
                                            [114] => 1
                                            [115] => 1
                                            [116] => 1
                                            [117] => 1
                                            [118] => 1
                                            [119] => 1
                                            [120] => 1
                                            [121] => 1
                                            [122] => 1
                                            [45] => 1
                                            [46] => 1
                                            [95] => 1
                                            [126] => 1
                                        )

                                )

                        )

                    [doEmbed:private] => 
                    [secretKey:private] => 
                    [replace:protected] => Array
                        (
                        )

                )

        )

    [registeredFilters:protected] => Array
        (
        )

    [base] => 
    [host] => 
    [defaultScheme] => http
    [setup] => 1
)

Method doSetup does not seem to get called. And so the $filters attribute is an empty array.

Re: Little extension to Munge (CurrentToken::name)
June 19, 2008 02:11PM

Well, that's to be expected, because Munge is a post-filter and thus is put in the postFilters array. I will be testing HTML Purifier on my laptop shortly.

Re: Little extension to Munge (CurrentToken::name)
June 19, 2008 02:31PM

Hahaha, actually, the behavior you are seeing is expected. By default (and this is a change in behavior from previous versions), HTML Purifier doesn't munge embedded URLs. Set %URI.MungeResources to be true and your code will work.

As for the warning, I can't reproduce that. That may be due to the change in the config schema internal format we did in order to save space; are you still seeing the error?

Jochem Blok
Re: Little extension to Munge (CurrentToken::name)
June 20, 2008 03:37AM

Just downloaded the latest version! I will integrate the new features in a few days. Thanks for the great support. Btw. I found my fault and so the reason the warning was created. Look at the space, haha.

Re: Little extension to Munge (CurrentToken::name)
June 20, 2008 09:28AM

Everything seems to work very smoothly! Great work!

Sorry, you do not have permission to post/reply in this forum.