Welcome! » Log In » Create A New Profile

Implementation of MathML DTD3

Posted by Abdul Ahmady 
Abdul Ahmady
Implementation of MathML DTD3
October 19, 2016 06:21PM

Hello everyone, first of all let me say that im impressed with how beautifully HTMLPurifier is written. Now back to the important stuff. I am in need to implement MathML definitions to be able to verify the supplied code on the server-side. I started imagining that it wouldn't be more than a day job, but soon it blew my mind, looking at the https://www.w3.org/Math/DTD/mathml3/mathml3.dtd it looks like i need some help here. I don't wanna start working on it just to find myself doing it all wrong. So here are my questions/needed assistance:

1. Is there anyone already working on this? (if so i would love to help out to get this out faster)

2. Is there an easy way (step by step guide of some sort) that can help me convert this DTD file to a Plugin, Function, or Class that could extend HTMLPurifier?

Any help would be very appreciated in advance Cheers!

Re: Implementation of MathML DTD3
October 20, 2016 03:35AM

Hello Abdul,

As far as I know, no one is working on this.

MathML is indeed going to be quite a trial. There may also be cases where HTML Purifier may not have enough knobs for you, and you are going to have to go in and make changes.

It probably makes the most sense to do this as a patchset on top of HTML Purifier. To add a new "module", look at the existing HTMLModules (e.g., HTMLPurifier_HTMLModule_Tables), also referencing http://htmlpurifier.org/docs/enduser-customize.html which gives a high level overview of the APIs in question.

But otherwise, yes, you're going to have to write some code for every definition. This is by design; an automated process might end up accidentally allowing an element that shouldn't be allowed. Everything should be looked at before being put on the whitelist.

This being said, the DTD isn't terribly user friendly. Perhaps consider working off the human readable spec instead.

Author:
Your Email:

Subject:

HTML input is enabled. Make sure you escape all HTML and angled brackets with < and >.

Auto-paragraphing is enabled. Double newlines will be converted to paragraphs; for single newlines, use the pre tag.

Allowed tags: a, abbr, acronym, b, blockquote, caption, cite, code, dd, del, dfn, div, dl, dt, em, i, ins, kbd, li, ol, p, pre, s, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, var.

For inputting literal code such as HTML and PHP for display, use CDATA tags to auto-escape your angled brackets, and pre to preserve newlines:

<pre><![CDATA[
Place code here
]]></pre>

Power users, you can hide this notice with:

.htmlpurifier-help {display:none;}

Message: