Welcome! » Log In » Create A New Profile

display URL source

Posted by notromda 
display URL source
October 22, 2008 10:46PM

I'd like to change A tags to display both the src url and the text somehow, whether by showing it all at once, or even calling a jquery plugin as a tooltip. Is there an easy way to set this up?

Re: display URL source
October 22, 2008 10:49PM

So, if you want to use JavaScript for the task, you don't need HTML Purifier at all; after loading the page, use jQuery to grab all A tags in the page and add the appropriate behavior (you can filter on class to only do this to user links or something).

If you want HTML Purifier to do this, it would certainly be possible, but it would have to be coded. It actually wouldn't be that complicated. Would you be interested in helping? I can get you setup and tell you what you need to do.

Re: display URL source
October 22, 2008 11:51PM

Sorry, let me explain more. I just added HTML Purifier to Maia Mailguard, to sanitize email when displaying it to the end user. Since we're displaying spam, I consider it the one of the most hostile of all inputs HTML Purifier may see. :) I configured it to block all URL's, but otherwise has a default install so far. The results have been just wonderful; it's impossible to slip in tracking images or trick the user into going someplace bad, at least not through our interface. But we had just one complaint, in that it completely removes the actual url from links, so it may be hard to discern a scam message from a legit one - it may be only one url that was changed from a copy of a legit message.

So I'd like to continue all the safety the HTML Purifier provides, but have some way to still *see* what url the link otherwise pointed to.

It could be just by changing

<a href="foo">bar</a>

to

bar (foo)

Or I could even envision rewriting it to allow for a jquery script to put it in a tooltip:

<a class="HelpTipAnchor">bar</a>
<span class="HelpTip">foo</span>

(A jquery call later looks for the classes and does the tooltip magic)

So all I need to do I guess, is take the A tag and rewrite it and its attribute a little. I suspect it can be done, but I haven't been able to figure out the docs for HTML Purifier yet. ;)

Re: display URL source
October 22, 2008 11:54PM

Ah, I understand! Yes, a custom Injector would do the trick nicely. So, are you interested in getting your hands wet with a little of HTML Purifier's internals? I promise they won't bite ;-)

Re: display URL source
October 23, 2008 12:04AM

Sure, just point me in the right direction... :)

Re: display URL source
October 23, 2008 12:06AM

Ok. So the first step is to set up the development environment (this is doubly important, since some of the features we'll be using haven't been released yet.) Check out this document for instructions, and check back here when you've got a working checkout. Trust me; it will be very nice to have when working on the feature.

Re: display URL source
October 23, 2008 12:21AM

Ran the unit tests and got 2 errors but otherwise looks good.

It looks like I'm going to somehow extend an Injector class, but I'm not clear as to what parameters I'm looking for. :)

Re: display URL source
October 23, 2008 12:22AM

Sweet! Post the two unit test errors and I'll take a look at those. I am working on a post describing what you need to do, it will be posted once I finish.

Re: display URL source
October 23, 2008 12:36AM

Ok, so here's the basic idea.

You will need to allow a tags within your AllowedTags set; they will be removed once they hit the Injector execution phase; if we get rid of them early, there's no way of telling what the link was when we're with Injector.

Injectors are "stream-based" processors. Suppose we have input HTML:

<a href="http://example.com">Foo</a> Bar

The injector's call graph will look like:

  1. handleElement: a
  2. handleText: Foo
  3. handleClose: a
  4. handleText: Bar

The corresponding changes we make are:

  1. Do nothing (you'll see why later)
  2. Do nothing
  3. Aha, it's an a close tag. Using the $token->start reference (which refers to the a tag we passed up), grab the URL, and then modify the stream (we do this by setting $token to the replacement; it's a reference) with the text token we want: " ($url)" (note, no need to escape anything). Then, using $this->backward(), rewind to the original a tag.
  4. Ok, so I lied: we're on handleElement: a, not handleText: bar. Delete the a tag by setting $token = false
  5. etc. our job here is done

Use of $this->backward() is a little involved: check out AutoParagraph for examples of usage. If it needs more explaining, I can do so here.

If we don't care about getting rid of the a tags completely, we can simplify this process a little:

  1. Do nothing (or optionally add the class tag you want)
  2. Do nothing
  3. Aha, it's an a close tag. Using the $token->start reference (which refers to the a tag we passed up), grab and delete the URL (as simple as unsetting $token->start->attr['href']), and then modify the stream (we do this by setting $token to an array; it's a reference) with the array of tokens we want: the closing end tag (you need to put it back in $token, and " ($url)" (note, no need to escape anything).
  4. We're done

You'll probably want to set up unit tests; use the other injectors as examples. I bet you can find where the test files are ;-)

Re: display URL source
October 23, 2008 12:43AM

By the way, I've moved the topic to Internals, and split out the test case failures. I'll deal with that on a separate thread.

Re: display URL source
October 23, 2008 01:37AM

I do like leaving the a tag there, so it underlines like a link... but we certainly want to make sure the href and onclick, etc are removed, ie, make sure it's otherwise safe. Do I need to include some other cleaning actions that would otherwise have been done for me?

Re: display URL source
October 23, 2008 01:40AM

Nope; removing href from the attr array is sufficient.

Re: display URL source
October 23, 2008 02:34AM

Wow, I actually have it working for my first test case. I followed the paragraph formatter example to use my class, but is that sufficient or do I need to add a configuration item?

$this->config->set('AutoFormat', 'Custom', array(new HTMLPurifier_Injector_DisplayLinkUrls()));

Re: display URL source
October 23, 2008 03:22AM

The forum won't let me post the patch, it thinks it is spam... Where should I send a patch for review?

Re: display URL source
October 23, 2008 11:18AM

I had not previously set AllowedElements, but when I do, (to allow a tags) it holds back a lot of others. Do I need to specify all of them?

Nevermind I was trying to be clever and put the new class in my existing structure, but the handleEnd hook doesn't exist in that version. Putting the devel version on the server works better.

Re: display URL source
October 23, 2008 12:26PM

I have to say, I'm impressed with the design I see in HTMLPurifier, this has been pretty easy to jump into and understand, once I got pointed to the right spot.

Looking at this feature, I'm trying to figure out how to make it extendable for several different output types.

In order to do a tooltip within our framework, I need to set a class and id on both the anchor tag and the newly injected span with the URL. The classes are set, but the id would need to be unique. I can make a class that does that, but it doesn't seem like something that belongs in the source of HTMLPurifier. I could put a specific version in the Maia source too, of course, but I wonder if a more generic option would be of interest:

In the constructor for the injector, pass along either text parameters to add to the tags, or even a reference to a function that will return the text to put in the attributes. Or in other terms, instantiate the Iterator with callbacks to specify the modified attributes. If there's a better pattern I let me know, I'm still working on the GoF book. :)

Another option might be to have more configuration items, but that seems like clutter.

Re: display URL source
October 23, 2008 01:51PM

Hi notromda,

What you've done sounds really awesome! My apologies for the spam filter; I've reconfigured it and you should be able to post the patch here now.

I followed the paragraph formatter example to use my class, but is that sufficient or do I need to add a configuration item?

$this->config->set('AutoFormat', 'Custom', array(new HTMLPurifier_Injector_DisplayLinkUrls()));

You should add a configuration directive for it, since I intend on adding this into the core. ;-)

I had not previously set AllowedElements, but when I do, (to allow a tags) it holds back a lot of others. Do I need to specify all of them?

Ah, that's interesting. If you have not specified AllowedElements, a tags will be allowed automatically, so nothing needs to be set. I forgot you're using the URI configuration directive to exclude links. Disregard that point.

I have to say, I'm impressed with the design I see in HTMLPurifier, this has been pretty easy to jump into and understand, once I got pointed to the right spot.

Glad to hear it! At some point I'll write documentation and a tutorial on making Injectors. I think it's one of the neatest and most under-utilized features in HTML Purifier.

In order to do a tooltip within our framework, I need to set a class and id on both the anchor tag and the newly injected span with the URL. The classes are set, but the id would need to be unique. I can make a class that does that, but it doesn't seem like something that belongs in the source of HTMLPurifier. I could put a specific version in the Maia source too, of course, but I wonder if a more generic option would be of interest:

So, a few interesting points here: HTML Purifier has already pre-empted you on the ID issue, you can read about it here. Unfortunately, you can't really use our built-in functionality for it, since that happens on the step after injectors!

However, I think we can follow the same principle: if we namespace the IDs appropriately, and keep track of the IDs we've already assigned, we should be able to keep things unique, and also not conflict with existing application IDs.

I think callback hooks would be great for the extensibility we're going for, although I also think configuration directive support for the basic use-cases would be a good idea. Oh, I never told you how to define configuration directives.

I'm not completely happy with the single namespace constraint on directives; when you have things like injectors with their own directives, it would make more sense to define AutoFormat.InjectorName.Directive. Maybe we'll change that in 3.2.

Re: display URL source
October 23, 2008 02:04PM

Akismet doesn't like me at all now. Patch is here: http://maiamailguard.pastebin.com/f14af2272

Re: display URL source
October 23, 2008 02:12PM

Because I like having patches around for posterity, here is the copypasta:

From 0cba68d9ebd4c12a3e5555332a3516d56519464a Mon Sep 17 00:00:00 2001
From: David Morton <mortonda@dgrmm.net>
Date: Thu, 23 Oct 2008 02:09:48 -0500
Subject: [PATCH] Custom Injector to display URL address along with link text.

When viewing potentially hostile html, it may be helpful to see what
a given link was pointing to.  This new injector takes the href
attribute and adds the text after the link, and deletes the href
attribute.

Other forms of display could easily be contrived, but this seems to be
a good basic way to present the information.

Signed-off-by: David Morton <mortonda@dgrmm.net>
---
 library/HTMLPurifier/Injector/DisplayLinkUrls.php  |   24 +++++++++++++++
 .../HTMLPurifier/Injector/DisplayLinkUrlsTest.php  |   32 ++++++++++++++++++++
 2 files changed, 56 insertions(+), 0 deletions(-)
 create mode 100644 library/HTMLPurifier/Injector/DisplayLinkUrls.php
 create mode 100644 tests/HTMLPurifier/Injector/DisplayLinkUrlsTest.php

diff --git a/library/HTMLPurifier/Injector/DisplayLinkUrls.php b/library/HTMLPurifier/Injector/DisplayLinkUrls.php
new file mode 100644
index 0000000..c314213
--- /dev/null
+++ b/library/HTMLPurifier/Injector/DisplayLinkUrls.php
@@ -0,0 +1,24 @@
+<?php
+
+/**
+ * Injector that displays the URL of an anchor instead of linking to it, in addition to showing the text of the link.
+ */
+class HTMLPurifier_Injector_DisplayLinkUrls extends HTMLPurifier_Injector
+{
+    
+    public $name = 'DisplayLinkUrls';
+    public $needed = array('a');
+    
+    public function handleElement(&$token) {
+    }
+    
+    public function handleEnd(&$token) {
+        if (isset($token->start->attr['href'])){
+            $url = $token->start->attr['href'];
+            unset($token->start->attr['href']);
+            $token = array($token, new HTMLPurifier_Token_Text(" ($url)"));
+        } else {
+            // nothing to display
+        }
+    }
+}
\ No newline at end of file
diff --git a/tests/HTMLPurifier/Injector/DisplayLinkUrlsTest.php b/tests/HTMLPurifier/Injector/DisplayLinkUrlsTest.php
new file mode 100644
index 0000000..af27715
--- /dev/null
+++ b/tests/HTMLPurifier/Injector/DisplayLinkUrlsTest.php
@@ -0,0 +1,32 @@
+<?php
+
+class HTMLPurifier_Injector_DisplayLinkUrlsTest extends HTMLPurifier_InjectorHarness
+{
+    
+    function setup() {
+        parent::setup();
+        $this->config->set('AutoFormat', 'Custom', array(new HTMLPurifier_Injector_DisplayLinkUrls()));
+    }
+    
+    function testBasicLink() {
+        $this->assertResult(
+            '<a href="http://malware.example.com">Don\'t go here!</a>',
+            '<a>Don\'t go here!</a> (http://malware.example.com)'
+        );
+    }
+    
+    function testEmptyLink() {
+        $this->assertResult(
+            '<a>Don\'t go here!</a>',
+            '<a>Don\'t go here!</a>'
+        );
+    }
+    function testEmptyText() {
+        $this->assertResult(
+            '<a href="http://malware.example.com"></a>',
+            '<a></a> (http://malware.example.com)'
+        );
+    }
+    
+}
+?>
\ No newline at end of file
-- 
1.5.6.5
Re: display URL source
October 23, 2008 02:48PM

So, a few interesting points here: HTML Purifier has already pre-empted you on the ID issue, you can read about it here. Unfortunately, you can't really use our built-in functionality for it, since that happens on the step after injectors!

I noticed. :) And it gets really fun, cause my implementation of the tooltip needs a prefix on the id, so using the id prefix in purifier would break it.

However, I think we can follow the same principle: if we namespace the IDs appropriately, and keep track of the IDs we've already assigned, we should be able to keep things unique, and also not conflict with existing application IDs.

I don't mind the filtering out original id's too much, but a namespace that keeps the ones we inject would be nice.

I think callback hooks would be great for the extensibility we're going for, although I also think configuration directive support for the basic use-cases would be a good idea. Oh, I never told you how to define configuration directives.

I'm not completely happy with the single namespace constraint on directives; when you have things like injectors with their own directives, it would make more sense to define AutoFormat.InjectorName.Directive. Maybe we'll change that in 3.2.

Ideally the hooks could be a string or a callback, and the receiving code could act accordingly. I guess a string is just a short circuit of a callback that returns a string anyway.

I'm just brainstorming here, but I think the parameters needed are:

array of attributes to put in the anchor tag, and their callbacks. another structure to pass in the additional text to append... and that one could be complex, with variable attributes, text, and parameters.

Re: display URL source
October 23, 2008 03:08PM
array of attributes to put in the anchor tag, and their callbacks. another structure to pass in the additional text to append... and that one could be complex, with variable attributes, text, and parameters.

I would prefer something a little simpler: the anchor start token itself, and then a text format in form "(%s)", where %s is substituted with the URL text. But it's up to you to code, so it's your call.

As for IDs, at this point I'm not sure I completely understand the subtleties of the issue at hand. Could you describe in more detail how your tooltips work?

Re: display URL source
October 23, 2008 03:12PM

I set up a version to try in my app, and ran into another snag - the tooltip pops up under the lightbox the message is viewed in... so I might have to come up with another method. Not related to purifier, but a snag until I figure out what direction I want to go in....

Re: display URL source
October 23, 2008 03:20PM

Gotcha.

Did patch review, everything looks good. I'm going to apply this to my master, set up a configuration directive, and then commit and push. You'll have to do a git reset --hard remotes/origin/master to update your branch when I'm done if you didn't create a topic branch for your commit.

Re: display URL source
October 23, 2008 04:53PM

After considering things a little, I'm going to rename the injector from DisplayLinkUrls to DisplayLinkURI to be consistent with the rest of HTML Purifier. I hope you don't object.

Re: display URL source
October 23, 2008 05:04PM

Committed.

Nyeh, for some reason Git set the author field to my value. I need to make it stop doing that.

Re: display URL source
October 23, 2008 05:14PM

slight typo in the comment field... It should be:

For example, example becomes example (http://example.com).

But of course, the test cases make it clear. :)

Re: display URL source
October 23, 2008 05:15PM

Sweet! You're the first contributor to HTML Purifier besides me. Congrats!

Re: display URL source
October 23, 2008 05:17PM

Oh yeah, that's true. See, Linkify is run on the configuration documentation, so I had to wrap it with a tags to make it clear that the right link isn't active. Not ideal, but whatever. They'll figure it out when they run the code, and we're going to make it customizable anyway.

Re: display URL source
October 24, 2008 11:10PM

ok, for the next step... I'd like to make it more flexible on what it outputs.

First question, Is there a routine somewhere that can read a small string and tokenize it?

I was thinking of making a helper class to go with this injector, which will have its methods called to populate the new link. The default class could do the output as we have it, and then a configuration item could be used to override with a subclass.

Is this the Strategy pattern?

Anyway, if there's a procedure to parse a small amount of html and return an array of tokens to put back into the stream, it would be very simply to override the class.

interface HTMLPurifier_Injector_DisplayLinkURI_Strategy
{

    // Called with text of link
    // returns 
    public function LinkAttributes($linktext);
    
    //called with uri of link
    public function URIDisplay($uri);
}

If not, then the subclass has to create the tokens directly. Anyway, see where I'm going with this?

Re: display URL source
October 24, 2008 11:46PM

Here's the next iteration: http://maiamailguard.pastebin.com/f31dbbd1b

I'm not sure how to instantiate the default strategy, unless that can be done in a configuration item. I suppose that's the way it needs to be instantiated so it can be set from the unit tests.

Author:
Your Email:

Subject:

HTML input is enabled. Make sure you escape all HTML and angled brackets with &lt; and &gt;.

Auto-paragraphing is enabled. Double newlines will be converted to paragraphs; for single newlines, use the pre tag.

Allowed tags: a, abbr, acronym, b, blockquote, caption, cite, code, dd, del, dfn, div, dl, dt, em, i, ins, kbd, li, ol, p, pre, s, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, var.

For inputting literal code such as HTML and PHP for display, use CDATA tags to auto-escape your angled brackets, and pre to preserve newlines:

<pre><![CDATA[
Place code here
]]></pre>

Power users, you can hide this notice with:

.htmlpurifier-help {display:none;}

Message: