Welcome! » Log In » Create A New Profile

Need short call to remove all HTML TAG because of memory problem

Posted by footcow 
Need short call to remove all HTML TAG because of memory problem
April 07, 2011 11:15AM

Hi,

I want to remove all HTML tags from HTML pages.

I'd like to know if there is a better way than making this call :

require_once('htmlpurifier/library/HTMLPurifier.auto.php');
$config = HTMLPurifier_Config::createDefault();
$config->set('HTML', 'Allowed', ''); // Allow Nothing
$purifier = new HTMLPurifier($config);
return $purifier->purify($html);

I get :

Fatal error: Allowed memory size of 52428800 bytes exhausted (tried to allocate 71 bytes) in /home/httpd/htdocs/lib/htmlpurifier-4.3.0/library/HTMLPurifier/Lexer/DOMLex.php on line 177

Call Stack:
   89.4199   15980456   1. scanWords->extractText() /home/httpd/htdocs/test/scanWords.php:287
   89.4343   16653936   2. HTMLPurifier->purify() /home/httpd/htdocs/test/scanWords.php:648
   89.4351   16668952   3. HTMLPurifier_Lexer_DOMLex->tokenizeHTML() /home/httpd/htdocs/lib/htmlpurifier-4.3.0/library/HTMLPurifier.php:179
   91.2438   18272472   4. HTMLPurifier_Lexer_DOMLex->tokenizeDOM() /home/httpd/htdocs/lib/htmlpurifier-4.3.0/library/HTMLPurifier/Lexer/DOMLex.php:70
   91.7585   52386104   5. HTMLPurifier_Lexer_DOMLex->createEndNode() /home/httpd/htdocs/lib/htmlpurifier-4.3.0/library/HTMLPurifier/Lexer/DOMLex.php:105

The page tested was a 580 Ko page size. My admin team do not want to change the configuration of the PHP memory allocation.

So may be could I call Purifier in a lighter way to get same result ? (only text!)

Any idea are welcome.

Thanks per advance.

Re: Need short call to remove all HTML TAG because of memory problem
April 07, 2011 11:29AM

striptags and then htmlentities.

Re: Need short call to remove all HTML TAG because of memory problem
April 07, 2011 11:41AM

Ambush Commander said :

striptags and then htmlentities.

??? no ... I want to leave correctly as is doing perfeclty Purifier the scripts and other malformed tags. PHP strip_tags functions is so buggy ! I can't use them ...

What I really want to know is there is any option to not going throught filters for example, or accessing just to the earth call of cleanning tags in Purifier (to get less memory usage). This tool is so good ... and works better than php functions.

Please you migh you this ...

Re: Need short call to remove all HTML TAG because of memory problem
April 07, 2011 11:46AM

You're running out of memory in the tokenization stage, so it's the internal representation of the HTML that's killing you. You might have some luck setting %Core.LexerImpl to DirectLex, or try using ini_set to bump the memory limit, but otherwise, you're out of luck.

Re: Need short call to remove all HTML TAG because of memory problem
April 07, 2011 01:58PM

Ok thanks for your answer ... DirectLex do not changed anything ... so yes init_set solved it ... but I'm not so happy with this ... Thanks again for your reactivity!

Author:
Your Email:

Subject:

HTML input is enabled. Make sure you escape all HTML and angled brackets with < and >.

Auto-paragraphing is enabled. Double newlines will be converted to paragraphs; for single newlines, use the pre tag.

Allowed tags: a, abbr, acronym, b, blockquote, caption, cite, code, dd, del, dfn, div, dl, dt, em, i, ins, kbd, li, ol, p, pre, s, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, var.

For inputting literal code such as HTML and PHP for display, use CDATA tags to auto-escape your angled brackets, and pre to preserve newlines:

<pre><![CDATA[
Place code here
]]></pre>

Power users, you can hide this notice with:

.htmlpurifier-help {display:none;}

Message: