Welcome! » Log In » Create A New Profile

Making HTMLPurifier avaiable via network

Posted by notromda 
Making HTMLPurifier avaiable via network
March 15, 2011 04:15PM

As I've mentioned elsewhere, I'm interested in using HTMLPurifier in other programming environments; since I don't have time to rewrite it in ruby, I made a simple network wrapper that creates an echo service that runs in php and sanitizes the input. It's as open source as can be, and I've put it on github:

https://github.com/dgm/htmlpurified

The basic idea is to connect to the server, send the document, followed by STOP on a line by itself (yeah I need to make an escape for that, and then an escape for the escape) and then it returns the result.

I have specific additions that I will need for Maia Mailguard, but it should be easy to modify if anyone else needs to do so.

Any suggestions, improvements or patches welcome. :)

Re: Making HTMLPurifier avaiable via network
March 15, 2011 06:26PM

Hello,

Why not stipulate only one purification per request?

Re: Making HTMLPurifier avaiable via network
March 15, 2011 07:43PM

Oh, it is one per request, the server is supposed to close the connection after doing it. My only problem is that it doesn't fire the onReceiveData() call at the end, until I fill up the buffer. I don't know why it does that. sending along some extra nulls at the end gets the onReceiveData() to fire one more time and hit the code to process the STOP command. (Has to be something to flag the server that the whole document is ready, right? or can HTMLPurifier work on a stream?)

Re: Making HTMLPurifier avaiable via network
March 15, 2011 07:48PM

Oh, I see, you're not using HTTP. You should try reusing HTTP, or have the client transmit the length of the data to be processed.

Re: Making HTMLPurifier avaiable via network
March 15, 2011 08:39PM

Actually, the more I think about it, this was an over-engineered idea... I think given the work load, it would be just as easy to pipe the data to a subshell running the script and scoop the return out of its std out. If there was a heavy server load doing a lot of this stuff, a dedicated server, with the htmlpurifier libs already loaded, might have merit. but premature optimization and all that.

Sorry, you do not have permission to post/reply in this forum.