Data not cleaned
August 12, 2011 11:57AM

Hey,

I tried the demo of html purifier and it was exactly what I needed. I copied the text that was displayed in my web browser (it is a bunch of data mixed in with html tags, that is all on the web page) and I put it in the demo. It came out clean just like I needed it. I then got the lite version and followed all the instructions, got no errors with libraries but when i echoed it to the browser, the data was displayed but remained uncleaned. I am wondering if im doing something wrong?

this is the code

include('/Users/teddy/Desktop/htmlpurifier-4-1.3.0-lite/library/HTMLPurifier.auto.php');

$clean_html = $purifier->purify ( $man); //$man contains the data that needs to be cleaned

echo $clean_html;

Re: Data not cleaned
August 13, 2011 11:58PM

Could you post the code you used when it worked?

Re: Data not cleaned
August 15, 2011 08:28PM

include('simple_html_dom.php');

include('/Users/teddy/Desktop/htmlpurifier-4-1.3.0-lite/library/HTMLPurifier.auto.php');

$html = file_get_html('http://www.lottolore.com/lotto649.html');

foreach($html->find('table[cellpadding=2]') as $e)

{

for ($i=0; $i < sizeof($e->innertext); $i++)

{

$test[$i]= $e->innertext; $a = htmlentities($e->innertext);

$file= file_get_contents($a);

$bob=preg_replace("/<([a-z][a-z0-9]*)[^>]*?(\/?)>/i",'<$1$2>', $a);

}

}

echo $bob;

//Thank you

Re: Data not cleaned
August 15, 2011 08:29PM

This code doesn't use HTML Purifier...

Re: Data not cleaned
August 15, 2011 08:59PM

Sorry I didnt include what I already posted first time around... This is the code when everything worked (though it does work even with html purifier, it just isnt cleaned).

Here is the whole thing:

include('simple_html_dom.php');

include('/Users/teddy/Desktop/htmlpurifier-4-1.3.0-lite/library/HTMLPurifier.auto.php');

$html = file_get_html('http://www.lottolore.com/lotto649.html');

foreach($html->find('table[cellpadding=2]') as $e)

{

for ($i=0; $i < sizeof($e->innertext); $i++)

{

$test[$i]= $e->innertext; $a = htmlentities($e->innertext);

$file= file_get_contents($a);

$man=preg_replace("/<([a-z][a-z0-9]*)[^>]*?(\/?)>/i",'<$1$2>', $a);

}

}

echo $man;

$clean_html = $purifier->purify ( $man); //$man contains the data that needs to be cleaned

echo $clean_html;

//Thank you

Re: Data not cleaned
August 15, 2011 09:03PM

You're echoing $man and $clean_html. One of those is not clean.

Re: Data not cleaned
August 15, 2011 09:07PM

Ugh sorry my mistake i am modifying code on the fly to try and make it work.

Here is the whole REAL thing:

include('simple_html_dom.php');

include('/Users/teddy/Desktop/htmlpurifier-4-1.3.0-lite/library/HTMLPurifier.auto.php');

$html = file_get_html('http://www.lottolore.com/lotto649.html');

foreach($html->find('table[cellpadding=2]') as $e)

{

for ($i=0; $i < sizeof($e->innertext); $i++)

{

$test[$i]= $e->innertext; $a = htmlentities($e->innertext);

$file= file_get_contents($a);

$man=preg_replace("/<([a-z][a-z0-9]*)[^>]*?(\/?)>/i",'<$1$2>', $a);

}

}

$clean_html = $purifier->purify ( $man); //$man contains the data that needs to be cleaned

echo $clean_html; // $clean_html should contain clean data but it does not

//Thank you

Re: Data not cleaned
August 15, 2011 09:08PM

What output do you get?

Re: Data not cleaned
August 15, 2011 09:22PM

the same as if i echoed $man on its own... it displays but like i said, it doesn't clean it like i hoped.

Re: Data not cleaned
August 15, 2011 09:30PM

Please paste the specific output you receive. Use the CDATA tags in order to not need to escape the HTML on this forum.

Re: Data not cleaned
August 15, 2011 09:36PM

<tr align="center"> <td colspan="5"><font size="4"><a name="past"><b>Past <font color="#FF0000">Lotto 6/49</font> Winning Numbers</a></b></font></td> </tr> <tr align="center"> <td><a href="lotto649.html"><b>Latest</b></a></td> <td><a href="l6490811.html"><b>Aug 11</b></a></td> <td><a href="l6490711.html"><b>Jul 11</b></a></td> <td><a href="l6490611.html"><b>Jun 11</b></a></td> <td><a href="l6490511.html"><b>May 11</b></a></td> </tr> <tr align="center"> <td><a href="l6490411.html"><b>Apr 11</b></a></td> <td><a href="l6490311.html"><b>Mar 11</b></a></td> <td><a href="l6490211.html"><b>Feb 11</b></a></td> <td><a href="l6490111.html"><b>Jan 11</b></a></td> <td><a href="l6491210.html"><b>Dec 10</b></a></td> </tr> <tr align="center"> <td><a href="l6491110.html"><b>Nov 10</b></a></td> <td><a href="l6491010.html"><b>Oct 10</b></a></td> <td><a href="l6490910.html"><b>Sep 10</b></a></td> <td><a href="l6490810.html"><b>Aug 10</b></a></td> <td><a href="l6490710.html"><b>Jul 10</b></a></td> </tr>

Re: Data not cleaned
August 15, 2011 09:41PM

Where is your $purifier object defined?

Re: Data not cleaned
August 15, 2011 09:45PM

If you look at my prior post with the real complete code, you will see it is defined there.

Example from my code:

//some code before

$test[$i]= $e->innertext; $a = htmlentities($e->innertext);

$file= file_get_contents($a);

$man=preg_replace("/<([a-z][a-z0-9]*)[^>]*?(\/?)>/i",'<$1$2>', $a);

}

}

$clean_html = $purifier->purify ( $man); //here the purifier object is defined

echo $clean_html; // $clean_html should contain clean data but it does not

Re: Data not cleaned
August 15, 2011 09:46PM

I'm looking for the line $purifier =.

Also, there is almost zero chance that HTML Purifier is misbehaving in this case, so it would be much easier for me if you posted, verbatim, real code that you ran.

Re: Data not cleaned
August 15, 2011 09:50PM

I have not done that line. I thought it was declared with

$clean_html = $purifier->purify ( $man);

What needs to go after $purifier =. ?

I have posted my code exactly as it is in the .php file, I have not changed it for any reason. I assume my problem lies in the $Purifier declaration you mentioned.

Re: Data not cleaned
August 15, 2011 10:09PM

Do you have this line?

    $purifier = new HTMLPurifier();
Re: Data not cleaned
August 15, 2011 10:15PM

No for some reason I removed it and did not return it after I was done meddling with it trying to get it to work.

I added it and now NOTHING is coming up in the web browser when I echo $clean_html. I even did an error report at the very top, nothing comes up as well as put it in different places in the code (though I think global declaration would have been good enough).

Any other suggestions?

Re: Data not cleaned
August 15, 2011 10:16PM

Paste your code. If it's self contained so I can run it, even better. The error is probably something silly.

Re: Data not cleaned
August 15, 2011 10:38PM



<?


error_reporting(E_ALL);ini_set('display_errors', 1); 

include('simple_html_dom.php');
include('/Users/teddy/Desktop/htmlpurifier-4-1.3.0-lite/library/HTMLPurifier.auto.php');

$html = file_get_html('http://www.lottolore.com/lotto649.html');



foreach($html->find('table[cellpadding=2]') as $e)
{	    
	for ($i=0; $i < sizeof($e->innertext); $i++)
	{
    $test[$i]= $e->innertext;
	
$a = htmlentities($e->innertext);
$file= file_get_contents($a);


$jim=preg_replace("/<([a-z][a-z0-9]*)[^>]*?(\/?)>/i",'<$1$2>', $a);

$purifier= new HTMLPurifier();
$clean_html = $purifier->purify ($jim)
echo $clean_html;

	}

}




?>

thanks a bunch

Re: Data not cleaned
August 15, 2011 10:40PM

You have a syntax error on line 26, which is why you see no errors.

Re: Data not cleaned
August 15, 2011 10:53PM

Yes I did not catch that, thanks. Fixed it but still not working, give this a go if you can.


<?


error_reporting(E_ALL);ini_set('display_errors', 1); 

include('simple_html_dom.php');
include('/Users/teddy/Desktop/htmlpurifier-4-1.3.0-lite/library/HTMLPurifier.auto.php');

$html = file_get_html('http://www.lottolore.com/lotto649.html');



foreach($html->find('table[cellpadding=2]') as $e)
{	    
	for ($i=0; $i < sizeof($e->innertext); $i++)
	{
    $test[$i]= $e->innertext;
	
$a = htmlentities($e->innertext);



$jim=preg_replace("/<([a-z][a-z0-9]*)[^>]*?(\/?)>/i",'<$1$2>', $a);

$purifier= new HTMLPurifier();
$clean_html = $purifier->purify ($jim);
echo $clean_html;

	}

}



?>

Re: Data not cleaned
August 15, 2011 11:04PM

Your code is deeply confused. Why are you htmlentities before running through HTML Purifier?

Re: Data not cleaned
August 15, 2011 11:11PM

I need to first retrieve the code from another website. The code comes back jumbled up with a bunch of HTML tags within what I need. I then do some more cleaning within the php to get rid of data I don't require. HTML purifier was supposed to just clean out the HTML code that I have fetched so it becomes readable. Do you have some suggestions regarding what I just said? Do you think the problem lies here?

Re: Data not cleaned
August 15, 2011 11:13PM

I think you're using the wrong tool. Are you trying to make the HTML safe for display or clean it up for further analysis?

Re: Data not cleaned
August 16, 2011 07:50AM

Cllean up so there is no HTML and only plain text. Is that not what the demo did as well?

Re: Data not cleaned
August 16, 2011 08:35AM

Use striptags.

Author:
Your Email:

Subject:

HTML input is enabled. Make sure you escape all HTML and angled brackets with &lt; and &gt;.

Auto-paragraphing is enabled. Double newlines will be converted to paragraphs; for single newlines, use the pre tag.

Allowed tags: a, abbr, acronym, b, blockquote, caption, cite, code, dd, del, dfn, div, dl, dt, em, i, ins, kbd, li, ol, p, pre, s, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, var.

For inputting literal code such as HTML and PHP for display, use CDATA tags to auto-escape your angled brackets, and pre to preserve newlines:

<pre><![CDATA[
Place code here
]]></pre>

Power users, you can hide this notice with:

.htmlpurifier-help {display:none;}

Message: