Welcome! » Log In » Create A New Profile

Problem with HTML.TargetBlank and HTML.Nofollow

Posted by stelmik 
Re: Problem with HTML.TargetBlank and HTML.Nofollow
February 02, 2012 08:22PM

I don't believe you.

ezyang@javelin:~/Dev/htmlpurifier$ cat > foo.php
<?php
include_once 'library/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
	$config->set('HTML.Nofollow', true);
	$config->set('HTML.TargetBlank', true);
	$config->set('HTML.Allowed', 'a,b,strong,i,em,u');
$purifier = new HTMLPurifier($config);
echo $purifier->purify('<a href="http://www.google.com/">test page</a>');
?>
ezyang@javelin:~/Dev/htmlpurifier$ php foo.php 
<a>test page</a>

I don't think you're telling me the whole story here, so I can't debug your problem. (At a guess, the problem is that you're not allow href as an attribute.)

stelmik
Re: Problem with HTML.TargetBlank and HTML.Nofollow
February 03, 2012 07:25AM

Sorry. You have right!

The real full code is:

<?php
include_once 'library/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
	$config->set('Core.Encoding', 'UTF-8');
	$config->set('HTML.Doctype', 'XHTML 1.0 Transitional');
	$config->set('AutoFormat.AutoParagraph', true);
	$config->set('URI.Host', $_SERVER['HTTP_HOST']);
	$config->set('HTML.Allowed', 'a,b,strong,i,em,u');
	$config->set('AutoFormat.RemoveEmpty.RemoveNbsp', true);
	$config->set('AutoFormat.RemoveEmpty', true);
	$config->set('AutoFormat.RemoveSpansWithoutAttributes', true);
	$config->set('Core.RemoveProcessingInstructions', true);
	$config->set('HTML.AllowedAttributes', '*.href,*.src,*.alt,*.border,*.align,*.width,*.height,*.vspace,*.hspace,*.target,*.rel,*.style');
	$config->set('HTML.ForbiddenAttributes', '*@action,*@background,*@codebase,*@dynsrc,*@lowsrc,*@class,*@on*');
	$config->set('HTML.TargetBlank', true);
	$config->set('HTML.Nofollow', true);
	$config->set('HTML.TidyLevel', 'heavy');
$purifier = new HTMLPurifier($config);
echo $purifier->purify('<a href="http://www.google.com/">test page</a>');
?>

And this code isn't work as I would like to. :-(

I tried to include standalone version:

include_once 'library/HTMLPurifier.standalone.php';

But there is the same result. :-(

The result is:

<a href="http://www.google.com/" target="blank">test page</a>

What is wrong? 1. there is "blank", but should be "_blank". 2. no "rel" attribute with "nofollow" value, but should be.

Re: Problem with HTML.TargetBlank and HTML.Nofollow
February 17, 2012 04:47AM

And this code isn't work as I would like to. :-(

Without knowing specifically if this is your problem, you seem to have some misconceptions about some of the configuration values. Allow me to explain:

1)

	$config->set('HTML.Allowed', 'a,b,strong,i,em,u');
	$config->set('HTML.AllowedAttributes', '*.href,*.src,*.alt,*.border,*.align,*.width,*.height,*.vspace,*.hspace,*.target,*.rel,*.style');

You either want to use HTML.AllowedElements and HTML.AllowedAttributes or only HTML.Allowed. From the HTML.Allowed doc:

This is a preferred convenience directive that combines %HTML.AllowedElements and %HTML.AllowedAttributes.

2)

	$config->set('HTML.ForbiddenAttributes', '*@action,*@background,*@codebase,*@dynsrc,*@lowsrc,*@class,*@on*');

From the HTML.ForbiddenAttributes doc:

Warning: This directive complements %HTML.ForbiddenElements, accordingly, check out that directive for a discussion of why you should think twice before using this directive.

From the HTML.ForbiddenElements doc:

This is the logical inverse of %HTML.AllowedElements, and it will override that directive, or any other directive.

Conclusion

In short: You want either (HTML.ForbiddenAttributes) or (HTML.AllowedAttributes and HTML.AllowedElements) or (HTML.Allowed).

If you fix that, your problem will probably go away.

(Edit: Fixed formatting after an HTML escaping glitch on the forum.)

Edited 1 time(s). Last edit at 07/30/2012 01:49PM by pinkgothic.

Re: Problem with HTML.TargetBlank and HTML.Nofollow
February 18, 2012 11:30AM

Aw, that's embarrassing. target="blank" rather than target="_blank" is a bug. You can fix it by patching library/HTMLPurifier/AttrTransform/TargetBlank.php looking for "blank" and replacing it with "_blank". This should be fixed in the next version.

As for rel, this is probably a bad interaction between Allowed and AllowedAttributes. If you follow pinkgothic's advice that should clear this up.

Thank you for your help :)

I think I understand everything.

As you explain I change my code to:

<?php
include_once 'library/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
	$config->set('Core.Encoding', 'UTF-8');
	$config->set('HTML.Doctype', 'XHTML 1.0 Transitional');
	$config->set('AutoFormat.AutoParagraph', true);
	$config->set('URI.Host', $_SERVER['HTTP_HOST']);
	$config->set('HTML.AllowedElements', array('a','b','strong','i','em','u'));
	$config->set('HTML.AllowedAttributes', array('a.href', 'img.src', '*.alt', '*.title', '*.border', '*.align', '*.width', '*.height', 'img.vspace', 'img.hspace', 'a.target', 'a.rel'));
	$config->set('AutoFormat.RemoveEmpty.RemoveNbsp', true);
	$config->set('AutoFormat.RemoveEmpty', true);
	$config->set('AutoFormat.RemoveSpansWithoutAttributes', true);
	$config->set('Core.RemoveProcessingInstructions', true);
	$config->set('HTML.TargetBlank', true);
	$config->set('HTML.Nofollow', true);
	$config->set('HTML.TidyLevel', 'heavy');
$purifier = new HTMLPurifier($config);
echo $purifier->purify('<a href="http://www.google.com/">test page</a>');
?>

And the result is:

1)

<a href="http://www.google.com/" target="blank">test page</a>

But in my opinion should:

<a href="http://www.google.com/" rel="nofollow" target="_blank">test page</a>

As you can see no "rel" attribute was added.

2) I patched library/HTMLPurifier/AttrTransform/TargetBlank.php and I replaced "blank" with "_blank" (1 occure), but it doesn't change the result.

I use "HTML Purifier 4.4.0".

My correction:

2) I patched library/HTMLPurifier/AttrTransform/TargetBlank.php and I replaced "blank" with "_blank" (2 occure: one inside a comment and one inside a code), but it doesn't change the result.

2) sorry, my mistake, patched library/HTMLPurifier/AttrTransform/TargetBlank.php works fine!

1) not working, what I do wrong? :(

Re: Problem with HTML.TargetBlank and HTML.Nofollow
March 15, 2012 11:47PM

Turn on your PHP warnings and notices.

I added lines (top of the PHP file):

error_reporting(E_ALL);
ini_set('display_errors', TRUE);

and now it look like:

<?php
ini_set('display_errors', TRUE);
error_reporting(E_ALL);

include_once 'library/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
	$config->set('Core.Encoding', 'UTF-8');
	$config->set('HTML.Doctype', 'XHTML 1.0 Transitional');
	$config->set('AutoFormat.AutoParagraph', true);
	$config->set('URI.Host', $_SERVER['HTTP_HOST']);
	$config->set('HTML.AllowedElements', array('a','b','strong','i','em','u'));
	$config->set('HTML.AllowedAttributes', array('a.href', 'img.src', '*.alt', '*.title', '*.border', '*.align', '*.width', '*.height', 'img.vspace', 'img.hspace', 'a.target', 'a.rel'));
	$config->set('AutoFormat.RemoveEmpty.RemoveNbsp', true);
	$config->set('AutoFormat.RemoveEmpty', true);
	$config->set('AutoFormat.RemoveSpansWithoutAttributes', true);
	$config->set('Core.RemoveProcessingInstructions', true);
	$config->set('HTML.TargetBlank', true);
	$config->set('HTML.Nofollow', true);
	$config->set('HTML.TidyLevel', 'heavy');
$purifier = new HTMLPurifier($config);
echo $purifier->purify('<a href="http://www.google.com/">test page</a>');
?>

But result is clear, no warnings or notices.

Do you tried execute my code? Is it working for you?

Re: Problem with HTML.TargetBlank and HTML.Nofollow
March 16, 2012 11:29AM

Yeah, I get a lot of warnings and notices when I execute your code. Do you have custom error handler installed or something?

No, I haven't. :(

OK. I executed the code on other server and there were some warnings. I modified code and now it look like:

<?php
ini_set('display_errors', TRUE);
error_reporting(E_ALL);

include_once 'include/htmlpurifier/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
	$config->set('Core.Encoding', 'UTF-8');
	$config->set('HTML.Doctype', 'XHTML 1.0 Transitional');
	$config->set('AutoFormat.AutoParagraph', true);
	$config->set('URI.Host', $_SERVER['HTTP_HOST']);
	$config->set('HTML.AllowedElements', array('a','b','strong','i','em','u','img','p','span'));
	$config->set('HTML.AllowedAttributes', array('a.href', 'img.src', '*.alt', '*.title', '*.border', '*.align', '*.width', '*.height', 'img.vspace', 'img.hspace', 'a.target', 'a.rel'));
	$config->set('AutoFormat.RemoveEmpty.RemoveNbsp', true);
	$config->set('AutoFormat.RemoveEmpty', true);
	$config->set('AutoFormat.RemoveSpansWithoutAttributes', true);
	$config->set('Core.RemoveProcessingInstructions', true);
	$config->set('HTML.TargetBlank', true);
	$config->set('HTML.Nofollow', true);
	$config->set('HTML.TidyLevel', 'heavy');
$purifier = new HTMLPurifier($config);
echo $purifier->purify('<a href="http://www.google.com/">test page</a>');
?>

and the result is:

<p><a href="http://www.google.com/" target="_blank">test page</a></p>

but should be:

<p><a href="http://www.google.com/" target="_blank" rel="nofollow">test page</a></p>

Why? :(

Below simple code doesn't work properly too:

<?php
ini_set('display_errors', TRUE);
error_reporting(E_ALL);

include_once 'include/htmlpurifier/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
	$config->set('Core.Encoding', 'UTF-8');
	$config->set('HTML.Doctype', 'XHTML 1.0 Transitional');
	$config->set('URI.Host', $_SERVER['HTTP_HOST']);
	$config->set('HTML.TargetBlank', true);
	$config->set('HTML.Nofollow', true);
$purifier = new HTMLPurifier($config);
echo $purifier->purify('<a href="http://www.google.com/">test page</a>');
?>

The result is:

<a href="http://www.google.com/" target="_blank">test page</a>

but should be:

<a href="http://www.google.com/" target="_blank" rel="nofollow">test page</a>
Re: Problem with HTML.TargetBlank and HTML.Nofollow
March 16, 2012 10:32PM

Looks like a bug. Investigating...

Re: Problem with HTML.TargetBlank and HTML.Nofollow
March 16, 2012 10:41PM

Diagnosed, fixing...

OK. I replaced files from: http://repo.or.cz/w/htmlpurifier.git/commit/7291f19347b05f5833421eaf5152558bfbd2b454 but the result doesn't change.

All the time the result is:

<a href="http://www.google.com/" target="_blank">test page</a>

Probably you didn't commit all files or not tested my example.

Re: Problem with HTML.TargetBlank and HTML.Nofollow
March 18, 2012 06:21PM

Try flushing the cache with maintenance/flush.php

Thanks. Now work as I expect. :)

xomero
Re: Problem with HTML.TargetBlank and HTML.Nofollow
October 24, 2012 09:42PM

Sorry, I have the same problem here, I make everything except the final flush, don't know how to use it. Trying to access in browser gives me a 403.

Re: Problem with HTML.TargetBlank and HTML.Nofollow
January 26, 2013 03:52PM

Greetings,

I am new to HTML purifier, it looks like an excellent and easy to use script!!

Unfortunately, I am having the same issues related to this post. I am using HTMLPurifier in conjunction with CKEditor. So far, I've got the entire TargetBlank issue on my end all cleared up. I can also allow "nofollow" when CKEditor automatically adds the attribute. The drawback with CKEditor is users can view the source code and manually erase the rel="nofollow" and save the post, which is a hole to get around the editor.

My only problem now is I can't get HTMLPurifier to add rel="nofollow" to external links when users somehow get around CKEditor. (I would like to remove rel="nofollow" on my INTERNAL links though).

Here is what I have:

require_once("../htmlpurifier-4.4.0/library/HTMLPurifier.auto.php");
$config = HTMLPurifier_Config::createDefault();
$config->set('URI.Host', 'mysite.com');
$config->set('Attr.AllowedRel', array('nofollow'));
$config->set('HTML.Nofollow', true);
$config->set('HTML.TargetBlank', true);
$config->set('AutoFormat.RemoveEmpty', true);
$purifier = new HTMLPurifier($config);

I've followed your instructions here and changed the source files in HTMLPurifier: http://repo.or.cz/w/htmlpurifier.git/commit/7291f19347b05f5833421eaf5152558bfbd2b454

Now when I go to maintenance/flush.php , I get the following error:

Forbidden

You don't have permission to access /htmlpurifier-4.4.0/maintenance/flush.php on this server.

Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.

How can I clear the cache so that your fix will work?

Thanks Kind regards

Edited 3 time(s). Last edit at 01/26/2013 03:55PM by peppy.

Re: Problem with HTML.TargetBlank and HTML.Nofollow
January 26, 2013 04:10PM

OOPS!! After looking for hours trying to flush the cache, I figured it out 1 minute after posting a question.

In case anyone else is trying to figure it out, go to maintenance/.htaccess and remove: Deny from all . Then go to maintenance/common.php and change:

if (php_sapi_name() != 'cli' && !getenv('PHP_IS_CLI')) {

to:

if (php_sapi_name() == 'cli' && getenv('PHP_IS_CLI')) {

Save both files, run maintenance/flush.php in your browser and it should work. Then reverse all the changes in these two files to what they were originally and save them.

. . .

There is one last piece to the puzzle that is not working right. rel="nofollow" is still being added to internal links.

Please let me know what needs to be done.

Thanks Kind regards

Re: Problem with HTML.TargetBlank and HTML.Nofollow
January 26, 2013 04:24PM

Ah nevermind, I just set:

$config->set('URI.Host', 'mysite.com');

to:


$config->set('URI.Host', $_SERVER['HTTP_HOST']);

This seems to does the trick. It removes the rel="nofollow" on internal links and adds it to external links, just like I wanted. Although it also removes the target="_blank" from internal links at the same time. This is not a big deal though, we can live with it!

I did find a possible bug though. I noticed that when these two lines exist at the same time:

$config->set('Attr.AllowedRel', array('nofollow'));
$config->set('HTML.Nofollow', true);

Any rel="nofollow" on external links that get re-saved during edit, will turn into rel="Array nofollow".

Thanks Kind regards

Edited 1 time(s). Last edit at 01/26/2013 04:25PM by peppy.

sitesense
Re: Problem with HTML.TargetBlank and HTML.Nofollow
February 14, 2013 04:43PM

Standalone version 4.4.0 suffers from same problem. I fixed it by replacing line 11273 in HTMLPurifier-standalone.php with the following:

if ($config->get('HTML.Nofollow')) {
    $attr['rel'] = 'nofollow';
    $attr['target'] = '_blank';
} else {
    $attr['target'] = '_blank';
}
Re: Problem with HTML.TargetBlank and HTML.Nofollow
February 15, 2013 10:58AM

parent="_blank" is not recommended anyway, _blank is not valid xhtml so if you use it with xhtml doc types it will not validate.

best way being as you are using rel is instead of using parent, use rel="external" instead. along with nofollow, rel="external nofollow"

ImpressCMS: Make A Lasting Impression

sitesense
Re: Problem with HTML.TargetBlank and HTML.Nofollow
March 02, 2013 11:07PM

Where is target="blank" valid? It's just wrong whatever the doctype.

Re: Problem with HTML.TargetBlank and HTML.Nofollow
March 03, 2013 01:57PM

well after reading up, bean a while.

parent="_blank" was invalid under XHTML W3C spec. so no pages would validate if that was used, which is why a javascript trick was created to use rel="external" to open links in a new window/tab.

the reasoning given by w3C for not being valid, is that it should be up to the user to decide if they want to open in new window or same page (afterall there's the right click context menu), but as we know many users don't bother or even know to use that, neither are they bothered about web standards.

i agree in some ways, but in others not, it can be annoying at times if u click a link and it opens in the same window, especially if you were in the middle of a long post or using an app.

my opinion is that, internal links should really open in the same window, with information links (like help tips) opening a small popup window, not the website itself.

external links to other websites should open in a new window so that you are not directed off the website you were on.

the good news in all this, is that under HTML5 they have caved to demand from developers & website owners, & target="_blank" is now a valid attribute, that means if _blank is used in your URL's it will pass when validated by W3C. not that being fully validated has ever bothered me or any of my clients anyway. I think it was just a fad so some companies could make more money.

ImpressCMS: Make A Lasting Impression

sitesense
Re: Problem with HTML.TargetBlank and HTML.Nofollow
March 04, 2013 02:47PM

I don't think we're on the same page :)

There is a bug that sets the target as "blank" rather than "_blank", notice the lack of the underscore. There seems to be another bug where if you set the target as _blank and also use rel="nofollow", - nofollow gets removed somehow.

My fix above sorts out both problems - in the standalone version.

It should perhaps be shortened like below, no real need for the "else":

if ($config->get('HTML.Nofollow')) {
    $attr['rel'] = 'nofollow';
}
$attr['target'] = '_blank';
Author:
Your Email:

Subject:

HTML input is enabled. Make sure you escape all HTML and angled brackets with &lt; and &gt;.

Auto-paragraphing is enabled. Double newlines will be converted to paragraphs; for single newlines, use the pre tag.

Allowed tags: a, abbr, acronym, b, blockquote, caption, cite, code, dd, del, dfn, div, dl, dt, em, i, ins, kbd, li, ol, p, pre, s, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, var.

For inputting literal code such as HTML and PHP for display, use CDATA tags to auto-escape your angled brackets, and pre to preserve newlines:

<pre><![CDATA[
Place code here
]]></pre>

Power users, you can hide this notice with:

.htmlpurifier-help {display:none;}

Message: