Welcome! » Log In » Create A New Profile

[Feature] linkifying URLs

Posted by chinohillsbanditos 
[Feature] linkifying URLs
April 02, 2007 10:09PM

Hi! I was just checking out the roadmap [hp.jpsband.org] and it said you were planning in the 2.0 release for linkifying URLs. I just want to say that would be a EXTREMELY helpful feature because that is one of the biggest nightmares I have had to deal with. I spent over a week on it and I still couldn't figure it out. Its easy to try and linkify text with a simple regex but tons of problems arise if the text it detects already happens to be a link and the fix for it to be done right is complicated.

I don't know if these will come in handy for because you have a better way of parsing it, but when you do it here are some links for code done for it:

(one of the better ones) [code.iamcal.com]

(other ones) [www.zend.com] [www.truerwords.net] [www.coffee2code.com]

Oh yeah, these options would be nice for it: If you set it to detect Everything it will pick up: google.com, coolsite.us, blah.net (major domain extensions) If you set it to detect Strict it will pick up only: http ://www.google.com, http ://coolsite.us, or http: //blah.net

And options that allow you to truncate the displayed URL, or just show domain, or no mangling of the displayed URL.

Edited 3 time(s). Last edit at 04/04/2007 12:05PM by Ambush Commander.

Re: linkifying URLs
April 02, 2007 10:20PM

Yeah, it's tough (there's a reason why it's marked COMPLEX). What I was planning on doing was hooking in a token filter, which would scan Text tokens for URI-like constructs, and linkify them. It's simpler than auto-paragraphing, to be sure, but still a bear.

I suppose this could be implemented with regexps. What you'll need to do is create a regular expression that globs up things that look like URIs, as well as things around them that might indicate that they are inside a tag (quotes and gt/lt signs come to mind). Then, do a preg_replace_callback(), where the callback function analyzes the matches and determines whether or not to do the replacement, or simply return $matches[0]. Complicated, to be sure. I'll bump up the legit approach on my priority list.

HTML Purifier, Standards Compliant HTML Filtering

Re: linkifying URLs
April 03, 2007 02:02PM

I wonder if 'linkifying' of URLs, when the feature arrives, could be optionally turned off (or on).

Some may say that HTMLPurifier should do only what its name suggests - purify HTML, correcting input data only to the extent that is needed, and that parsing text to linkify, which ? might interfere with the functionality of the larger software that HTMLPurifier is embedded in, should not be its goal.

Re: linkifying URLs
April 03, 2007 02:05PM

Ah yes, it wouldn't be enabled by default. Definitely no. This is part of the reason why it's so late in the changelog: it has nothing to do with filtering per-say. "Beyond HTML" sums it up nicely: there are certain features not part of HTML that are very convenient, and are easier to implement within HTML Purifier.

HTML Purifier, Standards Compliant HTML Filtering

Author:
Your Email:

Subject:

HTML input is enabled. Make sure you escape all HTML and angled brackets with < and >.

Auto-paragraphing is enabled. Double newlines will be converted to paragraphs; for single newlines, use the pre tag.

Allowed tags: a, abbr, acronym, b, blockquote, caption, cite, code, dd, del, dfn, div, dl, dt, em, i, ins, kbd, li, ol, p, pre, s, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, var.

For inputting literal code such as HTML and PHP for display, use CDATA tags to auto-escape your angled brackets, and pre to preserve newlines:

<pre><![CDATA[
Place code here
]]></pre>

Power users, you can hide this notice with:

.htmlpurifier-help {display:none;}

Message: