Class HTMLPurifier_Lexer_PEARSax3

Description

Proof-of-concept lexer that uses the PEAR package XML_HTMLSax3 to parse HTML.

Proof-of-concept lexer that uses the PEAR package XML_HTMLSax3 to parse HTML. PEAR, not suprisingly, also has a SAX parser for HTML. I don't know very much about implementation, but it's fairly well written. However, that abstraction comes at a price: performance. You need to have it installed, and if the API changes, it might break our adapter. Not sure whether or not it's UTF-8 aware, but it has some entity parsing trouble (in all areas, text and attributes). Quite personally, I don't recommend using the PEAR class, and the defaults don't use it. The unit tests do perform the tests on the SAX parser too, but whatever it does for poorly formed HTML is up to it.

  • todo: Generalize so that XML_HTMLSax is also supported.

Located in /library/HTMLPurifier/Lexer/PEARSax3.php (line 22)

HTMLPurifier_Lexer
   |
   --HTMLPurifier_Lexer_PEARSax3
Variable Summary
mixed $tokens
Method Summary
void closeHandler ( &$parser,  $name)
void dataHandler ( &$parser,  $data)
void escapeHandler ( &$parser,  $data)
void openHandler ( &$parser,  $name,  $attrs,  $closed)
void tokenizeHTML ( $string,  $config,  $context)
Variables
mixed $tokens = array() (line 28)

Internal accumulator array for SAX parsers.

Internal accumulator array for SAX parsers.

  • access: protected

Inherited Variables

Inherited from HTMLPurifier_Lexer

HTMLPurifier_Lexer::$_special_entity2str
Methods
closeHandler (line 70)

Close tag event handler, interface is defined by PEAR package.

Close tag event handler, interface is defined by PEAR package.

  • access: public
void closeHandler ( &$parser,  $name)
  • &$parser
  • $name
dataHandler (line 84)

Data event handler, interface is defined by PEAR package.

Data event handler, interface is defined by PEAR package.

  • access: public
void dataHandler ( &$parser,  $data)
  • &$parser
  • $data
escapeHandler (line 92)

Escaped text handler, interface is defined by PEAR package.

Escaped text handler, interface is defined by PEAR package.

  • access: public
void escapeHandler ( &$parser,  $data)
  • &$parser
  • $data
openHandler (line 54)

Open tag event handler, interface is defined by PEAR package.

Open tag event handler, interface is defined by PEAR package.

  • access: public
void openHandler ( &$parser,  $name,  $attrs,  $closed)
  • &$parser
  • $name
  • $attrs
  • $closed
tokenizeHTML (line 30)
  • access: public
void tokenizeHTML ( $string,  $config,  $context)
  • $string
  • $config
  • $context

Redefinition of:
HTMLPurifier_Lexer::tokenizeHTML()
Lexes an HTML string into tokens.

Inherited Methods

Inherited From HTMLPurifier_Lexer

HTMLPurifier_Lexer::__construct()
HTMLPurifier_Lexer::CDATACallback()
HTMLPurifier_Lexer::create()
HTMLPurifier_Lexer::escapeCDATA()
HTMLPurifier_Lexer::escapeCommentedCDATA()
HTMLPurifier_Lexer::extractBody()
HTMLPurifier_Lexer::normalize()
HTMLPurifier_Lexer::parseData()
HTMLPurifier_Lexer::tokenizeHTML()

Documentation generated on Thu, 19 Jun 2008 18:49:52 -0400 by phpDocumentor 1.4.2