display URL source October 22, 2008 10:46PM |
Registered: 9 years ago Posts: 36 |
Re: display URL source October 22, 2008 10:49PM |
Admin Registered: 11 years ago Posts: 3,111 |
So, if you want to use JavaScript for the task, you don't need HTML Purifier at all; after loading the page, use jQuery to grab all A tags in the page and add the appropriate behavior (you can filter on class to only do this to user links or something).
If you want HTML Purifier to do this, it would certainly be possible, but it would have to be coded. It actually wouldn't be that complicated. Would you be interested in helping? I can get you setup and tell you what you need to do.
Re: display URL source October 22, 2008 11:51PM |
Registered: 9 years ago Posts: 36 |
Sorry, let me explain more. I just added HTML Purifier to Maia Mailguard, to sanitize email when displaying it to the end user. Since we're displaying spam, I consider it the one of the most hostile of all inputs HTML Purifier may see. :) I configured it to block all URL's, but otherwise has a default install so far. The results have been just wonderful; it's impossible to slip in tracking images or trick the user into going someplace bad, at least not through our interface. But we had just one complaint, in that it completely removes the actual url from links, so it may be hard to discern a scam message from a legit one - it may be only one url that was changed from a copy of a legit message.
So I'd like to continue all the safety the HTML Purifier provides, but have some way to still *see* what url the link otherwise pointed to.
It could be just by changing
<a href="foo">bar</a>
to
bar (foo)
Or I could even envision rewriting it to allow for a jquery script to put it in a tooltip:
<a class="HelpTipAnchor">bar</a> <span class="HelpTip">foo</span>
(A jquery call later looks for the classes and does the tooltip magic)
So all I need to do I guess, is take the A tag and rewrite it and its attribute a little. I suspect it can be done, but I haven't been able to figure out the docs for HTML Purifier yet. ;)
Re: display URL source October 22, 2008 11:54PM |
Admin Registered: 11 years ago Posts: 3,111 |
Re: display URL source October 23, 2008 12:04AM |
Registered: 9 years ago Posts: 36 |
Re: display URL source October 23, 2008 12:06AM |
Admin Registered: 11 years ago Posts: 3,111 |
Ok. So the first step is to set up the development environment (this is doubly important, since some of the features we'll be using haven't been released yet.) Check out this document for instructions, and check back here when you've got a working checkout. Trust me; it will be very nice to have when working on the feature.
Re: display URL source October 23, 2008 12:21AM |
Registered: 9 years ago Posts: 36 |
Re: display URL source October 23, 2008 12:22AM |
Admin Registered: 11 years ago Posts: 3,111 |
Re: display URL source October 23, 2008 12:36AM |
Admin Registered: 11 years ago Posts: 3,111 |
Ok, so here's the basic idea.
You will need to allow a tags within your AllowedTags set; they will be removed once they hit the Injector execution phase; if we get rid of them early, there's no way of telling what the link was when we're with Injector.
Injectors are "stream-based" processors. Suppose we have input HTML:
<a href="http://example.com">Foo</a> Bar
The injector's call graph will look like:
The corresponding changes we make are:
" ($url)"
(note, no need to escape anything). Then, using $this->backward()
, rewind to the original a tag.Use of $this->backward() is a little involved: check out AutoParagraph for examples of usage. If it needs more explaining, I can do so here.
If we don't care about getting rid of the a tags completely, we can simplify this process a little:
" ($url)"
(note, no need to escape anything).You'll probably want to set up unit tests; use the other injectors as examples. I bet you can find where the test files are ;-)
Re: display URL source October 23, 2008 12:43AM |
Admin Registered: 11 years ago Posts: 3,111 |
Re: display URL source October 23, 2008 01:37AM |
Registered: 9 years ago Posts: 36 |
Re: display URL source October 23, 2008 01:40AM |
Admin Registered: 11 years ago Posts: 3,111 |
Re: display URL source October 23, 2008 02:34AM |
Registered: 9 years ago Posts: 36 |
Re: display URL source October 23, 2008 03:22AM |
Registered: 9 years ago Posts: 36 |
Re: display URL source October 23, 2008 11:18AM |
Registered: 9 years ago Posts: 36 |
I had not previously set AllowedElements, but when I do, (to allow a tags) it holds back a lot of others. Do I need to specify all of them?
Nevermind I was trying to be clever and put the new class in my existing structure, but the handleEnd hook doesn't exist in that version. Putting the devel version on the server works better.
Re: display URL source October 23, 2008 12:26PM |
Registered: 9 years ago Posts: 36 |
I have to say, I'm impressed with the design I see in HTMLPurifier, this has been pretty easy to jump into and understand, once I got pointed to the right spot.
Looking at this feature, I'm trying to figure out how to make it extendable for several different output types.
In order to do a tooltip within our framework, I need to set a class and id on both the anchor tag and the newly injected span with the URL. The classes are set, but the id would need to be unique. I can make a class that does that, but it doesn't seem like something that belongs in the source of HTMLPurifier. I could put a specific version in the Maia source too, of course, but I wonder if a more generic option would be of interest:
In the constructor for the injector, pass along either text parameters to add to the tags, or even a reference to a function that will return the text to put in the attributes. Or in other terms, instantiate the Iterator with callbacks to specify the modified attributes. If there's a better pattern I let me know, I'm still working on the GoF book. :)
Another option might be to have more configuration items, but that seems like clutter.
Re: display URL source October 23, 2008 01:51PM |
Admin Registered: 11 years ago Posts: 3,111 |
Hi notromda,
What you've done sounds really awesome! My apologies for the spam filter; I've reconfigured it and you should be able to post the patch here now.
I followed the paragraph formatter example to use my class, but is that sufficient or do I need to add a configuration item?
$this->config->set('AutoFormat', 'Custom', array(new HTMLPurifier_Injector_DisplayLinkUrls()));
You should add a configuration directive for it, since I intend on adding this into the core. ;-)
I had not previously set AllowedElements, but when I do, (to allow a tags) it holds back a lot of others. Do I need to specify all of them?
Ah, that's interesting. If you have not specified AllowedElements, a tags will be allowed automatically, so nothing needs to be set. I forgot you're using the URI configuration directive to exclude links. Disregard that point.
I have to say, I'm impressed with the design I see in HTMLPurifier, this has been pretty easy to jump into and understand, once I got pointed to the right spot.
Glad to hear it! At some point I'll write documentation and a tutorial on making Injectors. I think it's one of the neatest and most under-utilized features in HTML Purifier.
In order to do a tooltip within our framework, I need to set a class and id on both the anchor tag and the newly injected span with the URL. The classes are set, but the id would need to be unique. I can make a class that does that, but it doesn't seem like something that belongs in the source of HTMLPurifier. I could put a specific version in the Maia source too, of course, but I wonder if a more generic option would be of interest:
So, a few interesting points here: HTML Purifier has already pre-empted you on the ID issue, you can read about it here. Unfortunately, you can't really use our built-in functionality for it, since that happens on the step after injectors!
However, I think we can follow the same principle: if we namespace the IDs appropriately, and keep track of the IDs we've already assigned, we should be able to keep things unique, and also not conflict with existing application IDs.
I think callback hooks would be great for the extensibility we're going for, although I also think configuration directive support for the basic use-cases would be a good idea. Oh, I never told you how to define configuration directives.
I'm not completely happy with the single namespace constraint on directives; when you have things like injectors with their own directives, it would make more sense to define AutoFormat.InjectorName.Directive. Maybe we'll change that in 3.2.
Re: display URL source October 23, 2008 02:04PM |
Registered: 9 years ago Posts: 36 |
Akismet doesn't like me at all now. Patch is here: http://maiamailguard.pastebin.com/f14af2272
Re: display URL source October 23, 2008 02:12PM |
Admin Registered: 11 years ago Posts: 3,111 |
Because I like having patches around for posterity, here is the copypasta:
From 0cba68d9ebd4c12a3e5555332a3516d56519464a Mon Sep 17 00:00:00 2001 From: David Morton <mortonda@dgrmm.net> Date: Thu, 23 Oct 2008 02:09:48 -0500 Subject: [PATCH] Custom Injector to display URL address along with link text. When viewing potentially hostile html, it may be helpful to see what a given link was pointing to. This new injector takes the href attribute and adds the text after the link, and deletes the href attribute. Other forms of display could easily be contrived, but this seems to be a good basic way to present the information. Signed-off-by: David Morton <mortonda@dgrmm.net> --- library/HTMLPurifier/Injector/DisplayLinkUrls.php | 24 +++++++++++++++ .../HTMLPurifier/Injector/DisplayLinkUrlsTest.php | 32 ++++++++++++++++++++ 2 files changed, 56 insertions(+), 0 deletions(-) create mode 100644 library/HTMLPurifier/Injector/DisplayLinkUrls.php create mode 100644 tests/HTMLPurifier/Injector/DisplayLinkUrlsTest.php diff --git a/library/HTMLPurifier/Injector/DisplayLinkUrls.php b/library/HTMLPurifier/Injector/DisplayLinkUrls.php new file mode 100644 index 0000000..c314213 --- /dev/null +++ b/library/HTMLPurifier/Injector/DisplayLinkUrls.php @@ -0,0 +1,24 @@ +<?php + +/** + * Injector that displays the URL of an anchor instead of linking to it, in addition to showing the text of the link. + */ +class HTMLPurifier_Injector_DisplayLinkUrls extends HTMLPurifier_Injector +{ + + public $name = 'DisplayLinkUrls'; + public $needed = array('a'); + + public function handleElement(&$token) { + } + + public function handleEnd(&$token) { + if (isset($token->start->attr['href'])){ + $url = $token->start->attr['href']; + unset($token->start->attr['href']); + $token = array($token, new HTMLPurifier_Token_Text(" ($url)")); + } else { + // nothing to display + } + } +} \ No newline at end of file diff --git a/tests/HTMLPurifier/Injector/DisplayLinkUrlsTest.php b/tests/HTMLPurifier/Injector/DisplayLinkUrlsTest.php new file mode 100644 index 0000000..af27715 --- /dev/null +++ b/tests/HTMLPurifier/Injector/DisplayLinkUrlsTest.php @@ -0,0 +1,32 @@ +<?php + +class HTMLPurifier_Injector_DisplayLinkUrlsTest extends HTMLPurifier_InjectorHarness +{ + + function setup() { + parent::setup(); + $this->config->set('AutoFormat', 'Custom', array(new HTMLPurifier_Injector_DisplayLinkUrls())); + } + + function testBasicLink() { + $this->assertResult( + '<a href="http://malware.example.com">Don\'t go here!</a>', + '<a>Don\'t go here!</a> (http://malware.example.com)' + ); + } + + function testEmptyLink() { + $this->assertResult( + '<a>Don\'t go here!</a>', + '<a>Don\'t go here!</a>' + ); + } + function testEmptyText() { + $this->assertResult( + '<a href="http://malware.example.com"></a>', + '<a></a> (http://malware.example.com)' + ); + } + +} +?> \ No newline at end of file -- 1.5.6.5
Re: display URL source October 23, 2008 02:48PM |
Registered: 9 years ago Posts: 36 |
So, a few interesting points here: HTML Purifier has already pre-empted you on the ID issue, you can read about it here. Unfortunately, you can't really use our built-in functionality for it, since that happens on the step after injectors!
I noticed. :) And it gets really fun, cause my implementation of the tooltip needs a prefix on the id, so using the id prefix in purifier would break it.
However, I think we can follow the same principle: if we namespace the IDs appropriately, and keep track of the IDs we've already assigned, we should be able to keep things unique, and also not conflict with existing application IDs.
I don't mind the filtering out original id's too much, but a namespace that keeps the ones we inject would be nice.
I think callback hooks would be great for the extensibility we're going for, although I also think configuration directive support for the basic use-cases would be a good idea. Oh, I never told you how to define configuration directives.
I'm not completely happy with the single namespace constraint on directives; when you have things like injectors with their own directives, it would make more sense to define AutoFormat.InjectorName.Directive. Maybe we'll change that in 3.2.
Ideally the hooks could be a string or a callback, and the receiving code could act accordingly. I guess a string is just a short circuit of a callback that returns a string anyway.
I'm just brainstorming here, but I think the parameters needed are:
array of attributes to put in the anchor tag, and their callbacks. another structure to pass in the additional text to append... and that one could be complex, with variable attributes, text, and parameters.
Re: display URL source October 23, 2008 03:08PM |
Admin Registered: 11 years ago Posts: 3,111 |
array of attributes to put in the anchor tag, and their callbacks. another structure to pass in the additional text to append... and that one could be complex, with variable attributes, text, and parameters.
I would prefer something a little simpler: the anchor start token itself, and then a text format in form "(%s)", where %s is substituted with the URL text. But it's up to you to code, so it's your call.
As for IDs, at this point I'm not sure I completely understand the subtleties of the issue at hand. Could you describe in more detail how your tooltips work?
Re: display URL source October 23, 2008 03:12PM |
Registered: 9 years ago Posts: 36 |
Re: display URL source October 23, 2008 03:20PM |
Admin Registered: 11 years ago Posts: 3,111 |
Gotcha.
Did patch review, everything looks good. I'm going to apply this to my master, set up a configuration directive, and then commit and push. You'll have to do a git reset --hard remotes/origin/master to update your branch when I'm done if you didn't create a topic branch for your commit.
Re: display URL source October 23, 2008 04:53PM |
Admin Registered: 11 years ago Posts: 3,111 |
Re: display URL source October 23, 2008 05:04PM |
Admin Registered: 11 years ago Posts: 3,111 |
Nyeh, for some reason Git set the author field to my value. I need to make it stop doing that.
Re: display URL source October 23, 2008 05:14PM |
Registered: 9 years ago Posts: 36 |
slight typo in the comment field... It should be:
For example, example becomes example (http://example.com).
But of course, the test cases make it clear. :)
Re: display URL source October 23, 2008 05:15PM |
Admin Registered: 11 years ago Posts: 3,111 |
Re: display URL source October 23, 2008 05:17PM |
Admin Registered: 11 years ago Posts: 3,111 |
Re: display URL source October 24, 2008 11:10PM |
Registered: 9 years ago Posts: 36 |
ok, for the next step... I'd like to make it more flexible on what it outputs.
First question, Is there a routine somewhere that can read a small string and tokenize it?
I was thinking of making a helper class to go with this injector, which will have its methods called to populate the new link. The default class could do the output as we have it, and then a configuration item could be used to override with a subclass.
Is this the Strategy pattern?
Anyway, if there's a procedure to parse a small amount of html and return an array of tokens to put back into the stream, it would be very simply to override the class.
interface HTMLPurifier_Injector_DisplayLinkURI_Strategy { // Called with text of link // returns public function LinkAttributes($linktext); //called with uri of link public function URIDisplay($uri); }
If not, then the subclass has to create the tokens directly. Anyway, see where I'm going with this?
Re: display URL source October 24, 2008 11:46PM |
Registered: 9 years ago Posts: 36 |
Here's the next iteration: http://maiamailguard.pastebin.com/f31dbbd1b
I'm not sure how to instantiate the default strategy, unless that can be done in a configuration item. I suppose that's the way it needs to be instantiated so it can be set from the unit tests.