Welcome! » Log In » Create A New Profile

Extracting elements

Posted by Vaibhav Kaushal 
Vaibhav Kaushal
Extracting elements
May 30, 2012 05:05AM


I was wondering if there is a way to extract elements from the purified text. Something like:


$strPurified = $purifier->Purify($DirtyHtml); $arrElements = HTMLPurifier::Extract($strPurifier, 'a,img,b');


and then use something like this:


$strFirstLinkInText = $arrElements['a'][0];


Wouldn't that be a great addition? Since HTMLPurifier already is able to completely tear apart HTML and rejoin it, this would be a great addition for implementing some functionality on the server side which normally we should not want be done on the client side.

Regards, Vaibhav

Re: Extracting elements
May 30, 2012 09:41AM

Unfortunately not; you could just use DOM.

Re: Extracting elements
July 30, 2012 08:56AM

You can already manipulate HTML in fairly powerful ways with HTML Purifier if you customise it (example: removing empty <a> tags). See if that general approach helps you? You can probably emulate most things you'd like by customising HTML Purifier (which you can do without patching the library, might I add - it's easy to inject new classes into it).

Your Email:


HTML input is enabled. Make sure you escape all HTML and angled brackets with &lt; and &gt;.

Auto-paragraphing is enabled. Double newlines will be converted to paragraphs; for single newlines, use the pre tag.

Allowed tags: a, abbr, acronym, b, blockquote, caption, cite, code, dd, del, dfn, div, dl, dt, em, i, ins, kbd, li, ol, p, pre, s, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, var.

For inputting literal code such as HTML and PHP for display, use CDATA tags to auto-escape your angled brackets, and pre to preserve newlines:

Place code here

Power users, you can hide this notice with:

.htmlpurifier-help {display:none;}