Welcome! » Log In » Create A New Profile

HTMLPURIFIER_PREFIX not "framework-friendly"

Posted by Dalibor Karlović 
Dalibor Karlović
HTMLPURIFIER_PREFIX not "framework-friendly"
April 26, 2011 06:51AM

Hi,

at my workplace we're using HTMLPurifier wrapped inside our home-made framework which offers (among other things) it's own autoloading, dependency injection container, etc.

What we've done is a wrapper filter for our form component which let's us easily attach a HTMLPurifier instance to a field on a form, easily configure it (with "presets"), etc.

My point is that we're setting up HTMLPurifier somewhat differently than expected, don't load HTMLPurifier.bootstrap.php (nor do we wish to) and this is AFAIK the only place HTMLPURIFIER_PREFIX is defined. This means that we need to define it by hand in our own bootstrap which doesn't seem right. Once we do define it, it works fine.

Is there a way to define this constant somewhere else (for example, in HTMLPurifier.php) and have it fallback to dirname(__FILE__) if not defined?

Thanks.

Re: HTMLPURIFIER_PREFIX not "framework-friendly"
April 26, 2011 09:49AM

That's the point. If you do your own bootstrapping, the default HTMLPURIFIER_PREFIX may not do what you want. Better leave it to the framework in that case.

Dalibor Karlović
Re: HTMLPURIFIER_PREFIX not "framework-friendly"
April 26, 2011 11:04AM

I totally agree, but if you don't define it, HTMLPurifier will FAIL to work with some totally unrelated error.

A

if (!defined('HTMLPURIFIER_PREFIX')) {
    define('HTMLPURIFIER_PREFIX', dirname(__FILE__));
}

should do the trick.

Re: HTMLPURIFIER_PREFIX not "framework-friendly"
April 26, 2011 11:11AM

The defined function is known to have major performance problems in PHP; here is one set of benchmarks from the PHP manual comments:

true                                       0.65ms
$true                                      0.69ms (1)
$config['true']                            0.87ms
TRUE_CONST                                 1.28ms (2)
true                                       0.65ms
defined('TRUE_CONST')                      2.06ms (3)
defined('UNDEF_CONST')                    12.34ms (4)
isset($config['def_key'])                  0.91ms (5)
isset($config['undef_key'])                0.79ms
isset($empty_hash[$good_key])              0.78ms
isset($small_hash[$good_key])              0.86ms
isset($big_hash[$good_key])                0.89ms
isset($small_hash[$bad_key])               0.78ms
isset($big_hash[$bad_key])                 0.80ms

HTML Purifier has various provisions for omitting safety checks, and not using our user-friendly autoloader PHP file is one of them. If you are running by the metal, I expect you to do the right thing. Otherwise, use the user-friendly interface :-)

Dalibor Karlović
Re: HTMLPURIFIER_PREFIX not "framework-friendly"
April 26, 2011 12:54PM

I don't think that benchmarks should be a reason not to include this kind of feature, let alone ones that are measured in "ms". :) Those (worst-case) 12ms are a dot on HTMLPurifier's resource usage profiling and ain't gonna save it either way (not to sound like I'm knocking it, it's a great piece of code making a hard task easy and all of you are doing a great job).

Anyway, you can see how we use HTMLPurifier in a (semi-)unrelated issue on Github: https://github.com/beberlei/yadif/issues/15

The idea is not to force the developer to do anything, convention over configuration and that sort of thing. Only real change is to, if not add, then move the constant definition from HTMLPurifier_Bootstrap.php to HTMLPurifier.php.

What I want to do is something like this:

<?php
// ...
// my classmap-based autoloader is already setup here
// $container contains an instance of DIC, Yadif_Container, https://github.com/beberlei/yadif/
// the container will lazy-load and configure the entire thing if not already there
$purifier = $container->getComponent('purifier'),
$clean    = $purifier->purify($dirty);
// ...

If I don't need HTMLPurifier in that request, there would be no mention of it in the entire application.

This change would also mean there are less requirements from HTMLPurifier which is great for integrating it will all frameworks (Symfony, ZF, etc.), not just small homebrew ones. :) For example, now you MUST use the bootstrap (or replicate it's behaviour, like I do), but with the change you can set the autoloading part any way you like (as any framework prefers), you COULD for example use HTMLPurifier_Bootstrap (but don't need to).

So, that's my case, hope you didn't stop reading. :)

Re: HTMLPURIFIER_PREFIX not "framework-friendly"
December 25, 2011 09:58AM

I thought about the issue in more detail.

Unfortunately, the solution you propose doesn't actually make it possible to rely on a classic loading convention, permitting you to avoid extra configuration make things work. If you stare at Bootstrap.php carefully, you'll notice that we have some special cases, which you need Bootstrap.php in order to make properly:

    public static function getPath($class) {
        if (strncmp('HTMLPurifier', $class, 12) !== 0) return false;
        // Custom implementations
        if (strncmp('HTMLPurifier_Language_', $class, 22) === 0) {
            $code = str_replace('_', '-', substr($class, 22));
            $file = 'HTMLPurifier/Language/classes/' . $code . '.php';
        } else {
            $file = str_replace('_', '/', $class) . '.php';
        }
        if (!file_exists(HTMLPURIFIER_PREFIX . '/' . $file)) return false;
        return $file;
    }

You could argue that it would be better design if we didn't have these special cases. I could probably get on board with that, if you asked nicely enough (I don't have very much time to spare on HTML Purifier development these days, unfortunately). But one reason we have our own autoloader is to allow for other custom conventions to be added later, if necessary. So I'm not really keen on people replicating the behavior of the Bootstrap and then having things break when I change its behavior later.

Furthermore, your solution is fragile, because it requires you only to use HTMLPurifier as the entry point to the application. There are plenty of classes in HTML Purifier which can and should be used as stand alone, and they will in fact break under your scheme because you used them directly, not HTMLPurifier, and the defined constant never got set.

There is one final reason for not putting such a check in HTMLPurifier.php; if you don't have any ambient code executed in your include files, it's easier for systems like HipHop to optimize, since they can truly treat PHP files as class definitions, and not as "arbitrary scripts which may run some code."

Dalibor Karlović
Re: HTMLPURIFIER_PREFIX not "framework-friendly"
March 27, 2012 06:02AM

I see your point and agree with you on most counts. What if we place the problem on its head: what does HTMLPURIFIER_PREFIX do for the library? As I've found, it's used in these files (in 4.3.0):

DefinitionCache/Serializer.php 1x LanguageFactory.php 1x ConfigSchema/InterchangeBuilder.php 1x Printer/ConfigForm.php 2x EntityLookup.php 1x Bootstrap.php 4x ConfigSchema.php 1x

So, if we assume that allowing for 3rd party loaders is OK, this only leaves us with the HTMLPURIFIER_PREFIX issue. Can it be ripped out? For example, adding

class HTMLPurifier_Config
{
    // ...
    protected static $base;

    public static function setBase($base) {
        self::$base = $base;
    }

    public static function getBase() {
        if (null === self::$base) {
            self::$base = dirname(__FILE__);
        }
    }
}

We're now able to replace every usage of the constant with HTMLPurifier_Config::getBase() AND we keep the configurability if it's needed. The only problem left is to fix the custom loading cases (which I haven't encountered yet).

Author:
Your Email:

Subject:

HTML input is enabled. Make sure you escape all HTML and angled brackets with &lt; and &gt;.

Auto-paragraphing is enabled. Double newlines will be converted to paragraphs; for single newlines, use the pre tag.

Allowed tags: a, abbr, acronym, b, blockquote, caption, cite, code, dd, del, dfn, div, dl, dt, em, i, ins, kbd, li, ol, p, pre, s, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, var.

For inputting literal code such as HTML and PHP for display, use CDATA tags to auto-escape your angled brackets, and pre to preserve newlines:

<pre><![CDATA[
Place code here
]]></pre>

Power users, you can hide this notice with:

.htmlpurifier-help {display:none;}

Message: