|
stelmik
Problem with HTML.TargetBlank and HTML.NofollowFebruary 02, 2012 04:04PM |
Hi,
I have problem with HTML Purifier. I tried to find solution, but I can't.
My problem is:
I want to add/change 2 attributes (target="_blank" and rel="nofollow") to each tag. My code:
<?php include_once 'library/HTMLPurifier.auto.php'; $config = HTMLPurifier_Config::createDefault(); $config->set('HTML.Nofollow', true); $config->set('HTML.TargetBlank', true); $config->set('HTML.Allowed', 'a,b,strong,i,em,u'); $purifier = new HTMLPurifier($config); echo $purifier->purify('<a href="http://www.google.com/">test page</a>'); ?>And result of execution is:
<a href="http://www.google.com/" target="_blank">test page</a>Everytime HTMLPurifier adds "target" attribute only. I tried to change order like this:
$config->set('HTML.TargetBlank', true); $config->set('HTML.Nofollow', true);but result always is the same:
<a href="http://www.google.com/" target="_blank">test page</a>Is it possible to add both of attributes to the same HTML tag? What I do wrong? Thank you for each response and help.
|
Re: Problem with HTML.TargetBlank and HTML.Nofollow February 02, 2012 08:22PM |
Admin Registered: 6 years ago Posts: 2,639 |
I don't believe you.
ezyang@javelin:~/Dev/htmlpurifier$ cat > foo.php
<?php
include_once 'library/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.Nofollow', true);
$config->set('HTML.TargetBlank', true);
$config->set('HTML.Allowed', 'a,b,strong,i,em,u');
$purifier = new HTMLPurifier($config);
echo $purifier->purify('<a href="http://www.google.com/">test page</a>');
?>
ezyang@javelin:~/Dev/htmlpurifier$ php foo.php
<a>test page</a>
I don't think you're telling me the whole story here, so I can't debug your problem. (At a guess, the problem is that you're not allow href as an attribute.)
|
stelmik
Re: Problem with HTML.TargetBlank and HTML.NofollowFebruary 03, 2012 07:25AM |
Sorry. You have right!
The real full code is:
<?php
include_once 'library/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
$config->set('Core.Encoding', 'UTF-8');
$config->set('HTML.Doctype', 'XHTML 1.0 Transitional');
$config->set('AutoFormat.AutoParagraph', true);
$config->set('URI.Host', $_SERVER['HTTP_HOST']);
$config->set('HTML.Allowed', 'a,b,strong,i,em,u');
$config->set('AutoFormat.RemoveEmpty.RemoveNbsp', true);
$config->set('AutoFormat.RemoveEmpty', true);
$config->set('AutoFormat.RemoveSpansWithoutAttributes', true);
$config->set('Core.RemoveProcessingInstructions', true);
$config->set('HTML.AllowedAttributes', '*.href,*.src,*.alt,*.border,*.align,*.width,*.height,*.vspace,*.hspace,*.target,*.rel,*.style');
$config->set('HTML.ForbiddenAttributes', '*@action,*@background,*@codebase,*@dynsrc,*@lowsrc,*@class,*@on*');
$config->set('HTML.TargetBlank', true);
$config->set('HTML.Nofollow', true);
$config->set('HTML.TidyLevel', 'heavy');
$purifier = new HTMLPurifier($config);
echo $purifier->purify('<a href="http://www.google.com/">test page</a>');
?>
And this code isn't work as I would like to. :-(
I tried to include standalone version:
include_once 'library/HTMLPurifier.standalone.php';
But there is the same result. :-(
The result is:
<a href="http://www.google.com/" target="blank">test page</a>
What is wrong? 1. there is "blank", but should be "_blank". 2. no "rel" attribute with "nofollow" value, but should be.
|
Re: Problem with HTML.TargetBlank and HTML.Nofollow February 17, 2012 04:47AM |
Registered: 3 years ago Posts: 61 |
And this code isn't work as I would like to. :-(
Without knowing specifically if this is your problem, you seem to have some misconceptions about some of the configuration values. Allow me to explain:
1)
$config->set('HTML.Allowed', 'a,b,strong,i,em,u');
$config->set('HTML.AllowedAttributes', '*.href,*.src,*.alt,*.border,*.align,*.width,*.height,*.vspace,*.hspace,*.target,*.rel,*.style');
You either want to use HTML.AllowedElements and HTML.AllowedAttributes or only HTML.Allowed. From the HTML.Allowed doc:
This is a preferred convenience directive that combines %HTML.AllowedElements and %HTML.AllowedAttributes.
2)
$config->set('HTML.ForbiddenAttributes', '*@action,*@background,*@codebase,*@dynsrc,*@lowsrc,*@class,*@on*');
From the HTML.ForbiddenAttributes doc:
Warning: This directive complements %HTML.ForbiddenElements, accordingly, check out that directive for a discussion of why you should think twice before using this directive.
From the HTML.ForbiddenElements doc:
This is the logical inverse of %HTML.AllowedElements, and it will override that directive, or any other directive.
Conclusion
In short: You want either (HTML.ForbiddenAttributes) or (HTML.AllowedAttributes and HTML.AllowedElements) or (HTML.Allowed).
If you fix that, your problem will probably go away.
(Edit: Fixed formatting after an HTML escaping glitch on the forum.)
Edited 1 time(s). Last edit at 07/30/2012 01:49PM by pinkgothic.
|
Re: Problem with HTML.TargetBlank and HTML.Nofollow February 18, 2012 11:30AM |
Admin Registered: 6 years ago Posts: 2,639 |
Aw, that's embarrassing. target="blank" rather than target="_blank" is a bug. You can fix it by patching library/HTMLPurifier/AttrTransform/TargetBlank.php looking for "blank" and replacing it with "_blank". This should be fixed in the next version.
As for rel, this is probably a bad interaction between Allowed and AllowedAttributes. If you follow pinkgothic's advice that should clear this up.
|
stelmik
Re: Problem with HTML.TargetBlank and HTML.NofollowMarch 05, 2012 05:34PM |
|
stelmik
Re: Problem with HTML.TargetBlank and HTML.NofollowMarch 14, 2012 04:53PM |
I think I understand everything.
As you explain I change my code to:
<?php
include_once 'library/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
$config->set('Core.Encoding', 'UTF-8');
$config->set('HTML.Doctype', 'XHTML 1.0 Transitional');
$config->set('AutoFormat.AutoParagraph', true);
$config->set('URI.Host', $_SERVER['HTTP_HOST']);
$config->set('HTML.AllowedElements', array('a','b','strong','i','em','u'));
$config->set('HTML.AllowedAttributes', array('a.href', 'img.src', '*.alt', '*.title', '*.border', '*.align', '*.width', '*.height', 'img.vspace', 'img.hspace', 'a.target', 'a.rel'));
$config->set('AutoFormat.RemoveEmpty.RemoveNbsp', true);
$config->set('AutoFormat.RemoveEmpty', true);
$config->set('AutoFormat.RemoveSpansWithoutAttributes', true);
$config->set('Core.RemoveProcessingInstructions', true);
$config->set('HTML.TargetBlank', true);
$config->set('HTML.Nofollow', true);
$config->set('HTML.TidyLevel', 'heavy');
$purifier = new HTMLPurifier($config);
echo $purifier->purify('<a href="http://www.google.com/">test page</a>');
?>
And the result is:
1)
<a href="http://www.google.com/" target="blank">test page</a>
But in my opinion should:
<a href="http://www.google.com/" rel="nofollow" target="_blank">test page</a>
As you can see no "rel" attribute was added.
2) I patched library/HTMLPurifier/AttrTransform/TargetBlank.php and I replaced "blank" with "_blank" (1 occure), but it doesn't change the result.
I use "HTML Purifier 4.4.0".
|
stelmik
Re: Problem with HTML.TargetBlank and HTML.NofollowMarch 14, 2012 05:00PM |
|
stelmik
Re: Problem with HTML.TargetBlank and HTML.NofollowMarch 14, 2012 05:08PM |
|
Re: Problem with HTML.TargetBlank and HTML.Nofollow March 15, 2012 11:47PM |
Admin Registered: 6 years ago Posts: 2,639 |
|
stelmik
Re: Problem with HTML.TargetBlank and HTML.NofollowMarch 16, 2012 07:01AM |
I added lines (top of the PHP file):
error_reporting(E_ALL);
ini_set('display_errors', TRUE);
and now it look like:
<?php
ini_set('display_errors', TRUE);
error_reporting(E_ALL);
include_once 'library/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
$config->set('Core.Encoding', 'UTF-8');
$config->set('HTML.Doctype', 'XHTML 1.0 Transitional');
$config->set('AutoFormat.AutoParagraph', true);
$config->set('URI.Host', $_SERVER['HTTP_HOST']);
$config->set('HTML.AllowedElements', array('a','b','strong','i','em','u'));
$config->set('HTML.AllowedAttributes', array('a.href', 'img.src', '*.alt', '*.title', '*.border', '*.align', '*.width', '*.height', 'img.vspace', 'img.hspace', 'a.target', 'a.rel'));
$config->set('AutoFormat.RemoveEmpty.RemoveNbsp', true);
$config->set('AutoFormat.RemoveEmpty', true);
$config->set('AutoFormat.RemoveSpansWithoutAttributes', true);
$config->set('Core.RemoveProcessingInstructions', true);
$config->set('HTML.TargetBlank', true);
$config->set('HTML.Nofollow', true);
$config->set('HTML.TidyLevel', 'heavy');
$purifier = new HTMLPurifier($config);
echo $purifier->purify('<a href="http://www.google.com/">test page</a>');
?>
But result is clear, no warnings or notices.
Do you tried execute my code? Is it working for you?
|
Re: Problem with HTML.TargetBlank and HTML.Nofollow March 16, 2012 11:29AM |
Admin Registered: 6 years ago Posts: 2,639 |
|
stelmik
Re: Problem with HTML.TargetBlank and HTML.NofollowMarch 16, 2012 05:15PM |
|
stelmik
Re: Problem with HTML.TargetBlank and HTML.NofollowMarch 16, 2012 05:28PM |
OK. I executed the code on other server and there were some warnings. I modified code and now it look like:
<?php
ini_set('display_errors', TRUE);
error_reporting(E_ALL);
include_once 'include/htmlpurifier/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
$config->set('Core.Encoding', 'UTF-8');
$config->set('HTML.Doctype', 'XHTML 1.0 Transitional');
$config->set('AutoFormat.AutoParagraph', true);
$config->set('URI.Host', $_SERVER['HTTP_HOST']);
$config->set('HTML.AllowedElements', array('a','b','strong','i','em','u','img','p','span'));
$config->set('HTML.AllowedAttributes', array('a.href', 'img.src', '*.alt', '*.title', '*.border', '*.align', '*.width', '*.height', 'img.vspace', 'img.hspace', 'a.target', 'a.rel'));
$config->set('AutoFormat.RemoveEmpty.RemoveNbsp', true);
$config->set('AutoFormat.RemoveEmpty', true);
$config->set('AutoFormat.RemoveSpansWithoutAttributes', true);
$config->set('Core.RemoveProcessingInstructions', true);
$config->set('HTML.TargetBlank', true);
$config->set('HTML.Nofollow', true);
$config->set('HTML.TidyLevel', 'heavy');
$purifier = new HTMLPurifier($config);
echo $purifier->purify('<a href="http://www.google.com/">test page</a>');
?>
and the result is:
<p><a href="http://www.google.com/" target="_blank">test page</a></p>
but should be:
<p><a href="http://www.google.com/" target="_blank" rel="nofollow">test page</a></p>
Why? :(
|
stelmik
Re: Problem with HTML.TargetBlank and HTML.NofollowMarch 16, 2012 05:54PM |
Below simple code doesn't work properly too:
<?php
ini_set('display_errors', TRUE);
error_reporting(E_ALL);
include_once 'include/htmlpurifier/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
$config->set('Core.Encoding', 'UTF-8');
$config->set('HTML.Doctype', 'XHTML 1.0 Transitional');
$config->set('URI.Host', $_SERVER['HTTP_HOST']);
$config->set('HTML.TargetBlank', true);
$config->set('HTML.Nofollow', true);
$purifier = new HTMLPurifier($config);
echo $purifier->purify('<a href="http://www.google.com/">test page</a>');
?>
The result is:
<a href="http://www.google.com/" target="_blank">test page</a>
but should be:
<a href="http://www.google.com/" target="_blank" rel="nofollow">test page</a>
|
Re: Problem with HTML.TargetBlank and HTML.Nofollow March 16, 2012 10:32PM |
Admin Registered: 6 years ago Posts: 2,639 |
|
Re: Problem with HTML.TargetBlank and HTML.Nofollow March 16, 2012 10:41PM |
Admin Registered: 6 years ago Posts: 2,639 |
|
Re: Problem with HTML.TargetBlank and HTML.Nofollow March 16, 2012 11:12PM |
Admin Registered: 6 years ago Posts: 2,639 |
|
stelmik
Re: Problem with HTML.TargetBlank and HTML.NofollowMarch 18, 2012 05:22PM |
OK. I replaced files from: http://repo.or.cz/w/htmlpurifier.git/commit/7291f19347b05f5833421eaf5152558bfbd2b454 but the result doesn't change.
All the time the result is:
<a href="http://www.google.com/" target="_blank">test page</a>
Probably you didn't commit all files or not tested my example.
|
Re: Problem with HTML.TargetBlank and HTML.Nofollow March 18, 2012 06:21PM |
Admin Registered: 6 years ago Posts: 2,639 |
|
stelmik
Re: Problem with HTML.TargetBlank and HTML.NofollowMarch 19, 2012 04:15PM |
|
xomero
Re: Problem with HTML.TargetBlank and HTML.NofollowOctober 24, 2012 09:42PM |
|
Re: Problem with HTML.TargetBlank and HTML.Nofollow January 26, 2013 03:52PM |
Registered: 3 months ago Posts: 3 |
Greetings,
I am new to HTML purifier, it looks like an excellent and easy to use script!!
Unfortunately, I am having the same issues related to this post. I am using HTMLPurifier in conjunction with CKEditor. So far, I've got the entire TargetBlank issue on my end all cleared up. I can also allow "nofollow" when CKEditor automatically adds the attribute. The drawback with CKEditor is users can view the source code and manually erase the rel="nofollow" and save the post, which is a hole to get around the editor.
My only problem now is I can't get HTMLPurifier to add rel="nofollow" to external links when users somehow get around CKEditor. (I would like to remove rel="nofollow" on my INTERNAL links though).
Here is what I have:
require_once("../htmlpurifier-4.4.0/library/HTMLPurifier.auto.php");
$config = HTMLPurifier_Config::createDefault();
$config->set('URI.Host', 'mysite.com');
$config->set('Attr.AllowedRel', array('nofollow'));
$config->set('HTML.Nofollow', true);
$config->set('HTML.TargetBlank', true);
$config->set('AutoFormat.RemoveEmpty', true);
$purifier = new HTMLPurifier($config);
I've followed your instructions here and changed the source files in HTMLPurifier: http://repo.or.cz/w/htmlpurifier.git/commit/7291f19347b05f5833421eaf5152558bfbd2b454
Now when I go to maintenance/flush.php , I get the following error:
Forbidden
You don't have permission to access /htmlpurifier-4.4.0/maintenance/flush.php on this server.
Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.
How can I clear the cache so that your fix will work?
Thanks Kind regards
Edited 3 time(s). Last edit at 01/26/2013 03:55PM by peppy.
|
Re: Problem with HTML.TargetBlank and HTML.Nofollow January 26, 2013 04:10PM |
Registered: 3 months ago Posts: 3 |
OOPS!! After looking for hours trying to flush the cache, I figured it out 1 minute after posting a question.
In case anyone else is trying to figure it out, go to maintenance/.htaccess and remove: Deny from all . Then go to maintenance/common.php and change:
if (php_sapi_name() != 'cli' && !getenv('PHP_IS_CLI')) {
to:
if (php_sapi_name() == 'cli' && getenv('PHP_IS_CLI')) {
Save both files, run maintenance/flush.php in your browser and it should work. Then reverse all the changes in these two files to what they were originally and save them.
. . .
There is one last piece to the puzzle that is not working right. rel="nofollow" is still being added to internal links.
Please let me know what needs to be done.
Thanks Kind regards
|
Re: Problem with HTML.TargetBlank and HTML.Nofollow January 26, 2013 04:24PM |
Registered: 3 months ago Posts: 3 |
Ah nevermind, I just set:
$config->set('URI.Host', 'mysite.com');
to:
$config->set('URI.Host', $_SERVER['HTTP_HOST']);
This seems to does the trick. It removes the rel="nofollow" on internal links and adds it to external links, just like I wanted. Although it also removes the target="_blank" from internal links at the same time. This is not a big deal though, we can live with it!
I did find a possible bug though. I noticed that when these two lines exist at the same time:
$config->set('Attr.AllowedRel', array('nofollow'));
$config->set('HTML.Nofollow', true);
Any rel="nofollow" on external links that get re-saved during edit, will turn into rel="Array nofollow".
Thanks Kind regards
Edited 1 time(s). Last edit at 01/26/2013 04:25PM by peppy.
|
sitesense
Re: Problem with HTML.TargetBlank and HTML.NofollowFebruary 14, 2013 04:43PM |
|
Re: Problem with HTML.TargetBlank and HTML.Nofollow February 15, 2013 10:58AM |
Registered: 5 years ago Posts: 204 |
parent="_blank" is not recommended anyway, _blank is not valid xhtml so if you use it with xhtml doc types it will not validate.
best way being as you are using rel is instead of using parent, use rel="external" instead. along with nofollow, rel="external nofollow"
|
sitesense
Re: Problem with HTML.TargetBlank and HTML.NofollowMarch 02, 2013 11:07PM |
|
Re: Problem with HTML.TargetBlank and HTML.Nofollow March 03, 2013 01:57PM |
Registered: 5 years ago Posts: 204 |
well after reading up, bean a while.
parent="_blank" was invalid under XHTML W3C spec. so no pages would validate if that was used, which is why a javascript trick was created to use rel="external" to open links in a new window/tab.
the reasoning given by w3C for not being valid, is that it should be up to the user to decide if they want to open in new window or same page (afterall there's the right click context menu), but as we know many users don't bother or even know to use that, neither are they bothered about web standards.
i agree in some ways, but in others not, it can be annoying at times if u click a link and it opens in the same window, especially if you were in the middle of a long post or using an app.
my opinion is that, internal links should really open in the same window, with information links (like help tips) opening a small popup window, not the website itself.
external links to other websites should open in a new window so that you are not directed off the website you were on.
the good news in all this, is that under HTML5 they have caved to demand from developers & website owners, & target="_blank" is now a valid attribute, that means if _blank is used in your URL's it will pass when validated by W3C. not that being fully validated has ever bothered me or any of my clients anyway. I think it was just a fad so some companies could make more money.
|
sitesense
Re: Problem with HTML.TargetBlank and HTML.NofollowMarch 04, 2013 02:47PM |
I don't think we're on the same page :)
There is a bug that sets the target as "blank" rather than "_blank", notice the lack of the underscore. There seems to be another bug where if you set the target as _blank and also use rel="nofollow", - nofollow gets removed somehow.
My fix above sorts out both problems - in the standalone version.
It should perhaps be shortened like below, no real need for the "else":
if ($config->get('HTML.Nofollow')) {
$attr['rel'] = 'nofollow';
}
$attr['target'] = '_blank';