Return an entire HTML document from a fragment
June 10, 2009 11:39AM

Hi there.

I don't think this already exists, so I'm putting it in as a suggestion for now. I'd like to index HTML fragments with Zend_Search_Lucene, and from what I gather, it requires a whole HTML document (with metadata). I thought it would be nice to have a method that passes back a fragment of HTML as a valid full document, with title and metadata included. I feel that would be a useful feature, but I'm not sure of the other applications for this method.

All the best.

Re: Return an entire HTML document from a fragment
June 10, 2009 12:42PM

you can do that with PHP anyway, no point reinventing the wheel.

you could either do it pre-processing or post-processing.

<pre><![CDATA[ $content_header = " <!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd/"> <html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"en\" lang=\"en\"> <head> <title>Your title</title> <meta http-equiv=\"content-type\" content=\"text/html; charset=utf-8\" /> <meta name=\"robots\" content=\"index,follow\" /> <meta name=\"keywords\" content=\"your keywords\" /> <meta name=\"description\" content=\"Your Description\" /> <meta name=\"rating\" content=\"general\" /> <meta name=\"author\" content=\"the author\" /> <meta name=\"copyright\" content=\"Copyright © whatever\" /> <meta name=\"generator\" content=\"whatever\" /> </head> <body>";

$content_footer = " <P>your footer</p> </body> </html>";

$fragment = "Your HTML fragment goes here";

$complete_doc = $content_header.$fragment.$content_footer; ]]></pre>

then you filter either $complete_doc

or you can filter just $fragment on it's own, as long as you filter $fragment before you create $complete_doc.

the above is just an example and I haven't tested it, so code tweaking maybe needed. but i don't see the need for adding this feature when it could be done in PHP, and when done in PHP rather than through purifier, it is far easier in actually using on a dynamically changing site where meta_data will change.

that's just my opinion of course, i think that is what you are trying to achieve.

Re: Return an entire HTML document from a fragment
June 10, 2009 01:06PM

Full document support will (ostensibly) come some time in the HTML Purifier 5.x series; we don't actually have the parsing code necessary to actually deal with full HTML documents.

vaughan's suggestion is valid and probably the best way of doing this now.

Re: Return an entire HTML document from a fragment
June 10, 2009 01:32PM

Thanks a lot folks! :)

Re: Return an entire HTML document from a fragment
February 14, 2010 04:29PM

You can also use the php tidy interface to return an html document from a fragment.

Author:
Your Email:

Subject:

HTML input is enabled. Make sure you escape all HTML and angled brackets with < and >.

Auto-paragraphing is enabled. Double newlines will be converted to paragraphs; for single newlines, use the pre tag.

Allowed tags: a, abbr, acronym, b, blockquote, caption, cite, code, dd, del, dfn, div, dl, dt, em, i, ins, kbd, li, ol, p, pre, s, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, var.

For inputting literal code such as HTML and PHP for display, use CDATA tags to auto-escape your angled brackets, and pre to preserve newlines:

<pre><![CDATA[
Place code here
]]></pre>

Power users, you can hide this notice with:

.htmlpurifier-help {display:none;}

Message: