Welcome! » Log In » Create A New Profile

Enabled HTML.Nofollow add nofollow to internal links (4.3.0 Standalone)

Posted by 178 
178
Enabled HTML.Nofollow add nofollow to internal links (4.3.0 Standalone)
May 23, 2011 03:01AM

Description: http://htmlpurifier.org/live/configdoc/plain.html#HTML.Nofollow If enabled, nofollow rel attributes are added to all outgoing links.

Question: What is outgoing links?

Config: URI.Host = mysite.com, My site: mysite.com Test html:

<p><a href="http://mysite.com/about">test link</a>
<p><a href="http://notmysite.com/">not mine</a>

Result:

<p><a rel="nofollow" href="http://mysite.com/about">test link</a></p>
<p><a rel="nofollow" href="http://notmysite.com/">not mine</a>

Unexpectedly, I was expecting that `nofollow` is adding only to external links (such notmysite.com).

What am I doing wrong?

178
Re: Enabled HTML.Nofollow add nofollow to internal links (4.3.0 Standalone)
May 23, 2011 04:51AM

Hmm... did you set URI.Host? :)

http://htmlpurifier.org/live/configdoc/plain.html#URI.Host

Yep. In my first message you can notice:

Config: URI.Host = mysite.com
$HPConfig = HTMLPurifier_Config::createDefault();
$HPConfig->set('HTML.Doctype', 'XHTML 1.0 Strict');
// Get HtmlPurifier configuration settings from Garden
$HPSettings = Gdn::Config('HtmlPurifier');
if(is_array($HPSettings)) {
	foreach ($HPSettings as $Namespace => $Setting) {
		foreach ($Setting as $Name => $Value) {
			// Assign them to htmlpurifier
			$HPConfig->set($Namespace.'.'.$Name, $Value);
		}
	}
}

Are you trying to say that setting URI.Host should work fine, and somewhere I have bug in my code?

Re: Enabled HTML.Nofollow add nofollow to internal links (4.3.0 Standalone)
May 23, 2011 05:05AM

and somewhere I have bug in my code?

Actually just making sure because it sounds like it might be a bug in the HTML Purifier, then - since that is what's supposed to let it figure out what URIs are relative. Thanks for the code snippet - if everything else loads up fine then there's no reason that would be getting lost along the way.

For what its worth, if you're setting URI.Host exactly as quoted, try supplying the scheme, too. (Note, though: This shouldn't, to my knowledge, matter.)

Edit: I think it may be the is_null() in Nofollow.php, actually. Sometimes relative links don't result in a null host but an empty string for the host. Try changing the !is_null() to !empty(), see if that fixes it?

Edit II: See below.

(Edited again for formatting after a forum glitch.)

Edited 1 time(s). Last edit at 07/30/2012 02:18PM by pinkgothic.

Re: Enabled HTML.Nofollow add nofollow to internal links (4.3.0 Standalone)
May 23, 2011 06:14AM

We fixed something like this recently. What version are you using?

Re: Enabled HTML.Nofollow add nofollow to internal links (4.3.0 Standalone)
May 23, 2011 07:17AM

4.3.0 Standalone (see topic title). But for what it's worth:

I think it may be the is_null() in Nofollow.php, actually. Sometimes relative links don't result in a null host but an empty string for the host. Try changing the !is_null() to !empty(), see if that fixes it?

That's what I observed for 4.3.0, so... think that might be it?

Edit: See below. Probably a misconfiguration issue instead.

(Edited for formatting after a forum glitch.)

Edited 1 time(s). Last edit at 07/30/2012 02:19PM by pinkgothic.

178
Re: Enabled HTML.Nofollow add nofollow to internal links (4.3.0 Standalone)
May 23, 2011 07:30AM

@Ambush Commander, i'm using (4.3.0 Standalone version)

I did some research ~ L: 10780 class HTMLPurifier_AttrTransform_Nofollow extends HTMLPurifier_AttrTransform

// XXX Kind of inefficient
$url = $this->parser->parse($attr['href']);
$scheme = $url->getSchemeObj($config, $context);

print_r($url);
print_r($attr);
print_r($scheme);

// outputs:
HTMLPurifier_URI Object
(
    [scheme] => http
    [userinfo] => 
    [host] => www.mysite.com
    [port] => 
    [path] => /about/
    [query] => 
    [fragment] => 
)
Array
(
    [href] => www.mysite.com/about/
)
HTMLPurifier_URIScheme_http Object
(
    [default_port] => 80
    [browsable] => 1
    [hierarchical] => 1
    [may_omit_host] => 
)

if (!is_null($url->host) && $scheme !== false && $scheme->browsable) {

$scheme->browsable is TRUE, I dont see here checking something like $scheme->host != $config->URI->host

May be I need set another config setting which makes $scheme->browsable = False?

My config:

$Configuration['HtmlPurifier']['AutoFormat']['RemoveEmpty'] = TRUE;
$Configuration['HtmlPurifier']['AutoFormat']['AutoParagraph'] = TRUE;
$Configuration['HtmlPurifier']['Attr']['AllowedRel'] = array('nofollow', 'print');
$Configuration['HtmlPurifier']['URI']['Host'] = 'www.mysite.com';
$Configuration['HtmlPurifier']['HTML']['Nofollow'] = TRUE;

sets by snipped code mentioned above: http://htmlpurifier.org/phorum/read.php?2,5611,5613#msg-5613
Re: Enabled HTML.Nofollow add nofollow to internal links (4.3.0 Standalone)
May 23, 2011 07:38AM

My config:

$Configuration['HtmlPurifier']['URI']['Host'] = 'www.mysite.com';

Pretty sure 'mysite.com' won't be considered relative with that? (Might be mistaken.) I think you'll need to remove the 'www.'.

Edit: Yeah, should be; URI.Host is described like this:

[...] However, higher up domains will still be excluded: if you set %URI.Host to sub.example.com, example.com will be blocked. [...]

(Edited for formatting after a forum glitch.)

Edited 1 time(s). Last edit at 07/30/2012 02:19PM by pinkgothic.

178
Re: Enabled HTML.Nofollow add nofollow to internal links (4.3.0 Standalone)
May 23, 2011 07:56AM

No luck with $Configuration['HtmlPurifier']['URI']['Host'] = 'mysite.com'; same result.

I think now I can do only one thing - make relative links in my content, this makes $url->host = NULL

Re: Enabled HTML.Nofollow add nofollow to internal links (4.3.0 Standalone)
May 25, 2011 05:02AM

I dont see here checking something like $scheme->host != $config->URI->host

I think that's the source of the problem. Maybe Edward can add a ->isLocal() (or something with a better name) to the URI object to make it easier to check? The necessity of that distinction does crop up occasionally.

The check you'd want is this, by the way:

if ($uri->host !== $config->getDefinition(&#039;URI&#039;)->host) {

See library/HTMLPurifier/URIFilter/Munge.php.

Does that help you? (Can you even patch the module, or aren't you allowed to tinker with third party modules? I know I'm not, in my project.)

(Edited for formatting after a forum glitch.)

Edited 1 time(s). Last edit at 07/30/2012 02:20PM by pinkgothic.

178
Re: Enabled HTML.Nofollow add nofollow to internal links (4.3.0 Standalone)
May 30, 2011 04:00AM

Another funny thing.

$Configuration[&#039;HtmlPurifier&#039;][&#039;URI&#039;][&#039;MakeAbsolute&#039;] = TRUE;
$Configuration[&#039;HtmlPurifier&#039;][&#039;URI&#039;][&#039;Base&#039;] = &#039;http://www.example.com&#039;;
$Configuration[&#039;HtmlPurifier&#039;][&#039;HTML&#039;][&#039;Nofollow&#039;] = TRUE;

Source:

<a href="/">test</a>

Result:

<a rel="nofollow" href="http://www.example.com/">test</a>

@pinkgothic This patch will definitely help, but I dont want to make changes in foreign code (i'll lose ability to update)

Update: Adds nofollow even if it is already exists

<a rel="nofollow" href="http://xxxx.yyy.com/">dummy</a>

Result:
<a rel="nofollow nofollow" href="http://xxxx.yyy.com/">dummy</a>
178
Re: Enabled HTML.Nofollow add nofollow to internal links (4.3.0 Standalone)
May 30, 2011 04:01AM

I need fix it somehow (by custom module, for example)

Re: Enabled HTML.Nofollow add nofollow to internal links (4.3.0 Standalone)
May 30, 2011 05:24AM

@178:

Ah, darn. I understand not wanting to patch third-party code, though. :)

You could write your own attribute transformer, it's actually pretty easy.

Take a look at this link: http://htmlpurifier.org/docs/enduser-customize.html

Or for a real world example: http://stackoverflow.com/questions/2638640/html-purifier-removing-an-element-conditionally-based-on-its-attributes ("Point of reference" section; there are probably better examples, but it's the one I have at hand).

(Edited for formatting after a forum glitch.)

Edited 1 time(s). Last edit at 07/30/2012 02:21PM by pinkgothic.

Re: Enabled HTML.Nofollow add nofollow to internal links (4.3.0 Standalone)
May 30, 2011 08:08AM

I'm slightly busy, due to exams, but it's extremely likely that a patch like this would make its way into the next HTML Purifier version, so I don't think it's all that risky.

Re: Enabled HTML.Nofollow add nofollow to internal links (4.3.0 Standalone)
June 11, 2011 11:08AM

I've fixed up the duplicate nofollow problem, and am cooking up a patch to make it easier to check if a URL is "local".

Re: Enabled HTML.Nofollow add nofollow to internal links (4.3.0 Standalone)
June 11, 2011 12:41PM

and am cooking up a patch to make it easier to check if a URL is "local".

This. This is very awesome. Mad props.

(Edited for formatting after a forum glitch.)

Edited 1 time(s). Last edit at 07/30/2012 02:22PM by pinkgothic.

Re: Enabled HTML.Nofollow add nofollow to internal links (4.3.0 Standalone)
June 11, 2011 03:52PM

Semantics question: should this helper method take into account https/http transitions, or should that just be an extra special case for %URI.Munge (since I don't think nofollow'ing http links from https is appropriate.)

Re: Enabled HTML.Nofollow add nofollow to internal links (4.3.0 Standalone)
June 11, 2011 09:04PM

Semantics question: should this helper method take into account https/http transitions, or should that just be an extra special case for %URI.Munge (since I don't think nofollow'ing http links from https is appropriate.)

Hmm, that's very true. Special case it is, then - or: Maybe let a boolean parameter decide? :) isLocal($checkScheme = false) or isLocal($strict = false) or something along those lines, maybe?

(Edit: Fixed formatting after a forum glitch)

Edited 1 time(s). Last edit at 07/30/2012 02:23PM by pinkgothic.

Re: Enabled HTML.Nofollow add nofollow to internal links (4.3.0 Standalone)
June 12, 2011 03:52AM

I'm trying to think of another situation where http to https is OK but https to http is not, and failing.

Re: Enabled HTML.Nofollow add nofollow to internal links (4.3.0 Standalone)
June 12, 2011 05:59AM

Pushed as ce231cab9c551a6b9453980212272199f78af5d2

Re: Enabled HTML.Nofollow add nofollow to internal links (4.3.0 Standalone)
June 12, 2011 07:17AM

I'm trying to think of another situation where http to https is OK but https to http is not, and failing.

Ah! Yes, sorry. I forgot it's not just a scheme check. *derp derp*

Thank you!

(Edited for formatting after a forum glitch.)

Edited 1 time(s). Last edit at 07/30/2012 02:24PM by pinkgothic.

178
Re: Enabled HTML.Nofollow add nofollow to internal links (4.3.0 Standalone)
August 24, 2011 05:21AM

@Ambush Commander Please, remove mention of my email from NEWS file.

Re: Enabled HTML.Nofollow add nofollow to internal links (4.3.0 Standalone)
August 24, 2011 09:58AM

I've removed you from the history. The blobs will persist until the repo.or.cz admins garbage collect, please ask them to go do that at admin@repo.or.cz

Sorry, you do not have permission to post/reply in this forum.