HTMLPurifier 4.4.0
HTMLPurifier_Lexer_DOMLex Class Reference

Parser that uses PHP 5's DOM extension (part of the core). More...

Inheritance diagram for HTMLPurifier_Lexer_DOMLex:
HTMLPurifier_Lexer HTMLPurifier_Lexer HTMLPurifier_Lexer_PH5P

List of all members.

Public Member Functions

 __construct ()
 tokenizeHTML ($html, $config, $context)
 Lexes an HTML string into tokens.
 muteErrorHandler ($errno, $errstr)
 An error handler that mutes all errors.
 callbackUndoCommentSubst ($matches)
 Callback function for undoing escaping of stray angled brackets in comments.
 callbackArmorCommentEntities ($matches)
 Callback function that entity-izes ampersands in comments so that callbackUndoCommentSubst doesn't clobber them.
 __construct ()
 tokenizeHTML ($html, $config, $context)
 Lexes an HTML string into tokens.
 muteErrorHandler ($errno, $errstr)
 An error handler that mutes all errors.
 callbackUndoCommentSubst ($matches)
 Callback function for undoing escaping of stray angled brackets in comments.
 callbackArmorCommentEntities ($matches)
 Callback function that entity-izes ampersands in comments so that callbackUndoCommentSubst doesn't clobber them.

Protected Member Functions

 tokenizeDOM ($node, &$tokens)
 Iterative function that tokenizes a node, putting it into an accumulator.
 createStartNode ($node, &$tokens, $collect)
 createEndNode ($node, &$tokens)
 transformAttrToAssoc ($node_map)
 Converts a DOMNamedNodeMap of DOMAttr objects into an assoc array.
 wrapHTML ($html, $config, $context)
 Wraps an HTML fragment in the necessary HTML.
 tokenizeDOM ($node, &$tokens)
 Iterative function that tokenizes a node, putting it into an accumulator.
 createStartNode ($node, &$tokens, $collect)
 createEndNode ($node, &$tokens)
 transformAttrToAssoc ($node_map)
 Converts a DOMNamedNodeMap of DOMAttr objects into an assoc array.
 wrapHTML ($html, $config, $context)
 Wraps an HTML fragment in the necessary HTML.

Private Attributes

 $factory

Detailed Description

Parser that uses PHP 5's DOM extension (part of the core).

In PHP 5, the DOM XML extension was revamped into DOM and added to the core. It gives us a forgiving HTML parser, which we use to transform the HTML into a DOM, and then into the tokens. It is blazingly fast (for large documents, it performs twenty times faster than HTMLPurifier_Lexer_DirectLex,and is the default choice for PHP 5.

Note:
Any empty elements will have empty tokens associated with them, even if this is prohibited by the spec. This is cannot be fixed until the spec comes into play.
PHP's DOM extension does not actually parse any entities, we use our own function to do that.
Warning:
DOM tends to drop whitespace, which may wreak havoc on indenting. If this is a huge problem, due to the fact that HTML is hand edited and you are unable to get a parser cache that caches the the output of HTML Purifier while keeping the original HTML lying around, you may want to run Tidy on the resulting output or use HTMLPurifier_DirectLex

Definition at line 27 of file DOMLex.php.


Constructor & Destructor Documentation

HTMLPurifier_Lexer_DOMLex::__construct ( )

Reimplemented from HTMLPurifier_Lexer.

Definition at line 32 of file DOMLex.php.

Referenced by __construct().

HTMLPurifier_Lexer_DOMLex::__construct ( )

Reimplemented from HTMLPurifier_Lexer.

Definition at line 14712 of file HTMLPurifier.standalone.php.

References __construct().


Member Function Documentation

HTMLPurifier_Lexer_DOMLex::callbackArmorCommentEntities ( matches)

Callback function that entity-izes ampersands in comments so that callbackUndoCommentSubst doesn't clobber them.

Definition at line 216 of file DOMLex.php.

HTMLPurifier_Lexer_DOMLex::callbackArmorCommentEntities ( matches)

Callback function that entity-izes ampersands in comments so that callbackUndoCommentSubst doesn't clobber them.

Definition at line 14896 of file HTMLPurifier.standalone.php.

HTMLPurifier_Lexer_DOMLex::callbackUndoCommentSubst ( matches)

Callback function for undoing escaping of stray angled brackets in comments.

Definition at line 208 of file DOMLex.php.

HTMLPurifier_Lexer_DOMLex::callbackUndoCommentSubst ( matches)

Callback function for undoing escaping of stray angled brackets in comments.

Definition at line 14888 of file HTMLPurifier.standalone.php.

HTMLPurifier_Lexer_DOMLex::createEndNode ( node,
&$  tokens 
) [protected]

Definition at line 176 of file DOMLex.php.

Referenced by tokenizeDOM().

HTMLPurifier_Lexer_DOMLex::createEndNode ( node,
&$  tokens 
) [protected]

Definition at line 14856 of file HTMLPurifier.standalone.php.

HTMLPurifier_Lexer_DOMLex::createStartNode ( node,
&$  tokens,
collect 
) [protected]
Parameters:
$nodeDOMNode to be tokenized.
$tokensArray-list of already tokenized tokens.
$collectSays whether or start and close are collected, set to false at first recursion because it's the implicit DIV tag you're dealing with.
Returns:
bool if the token needs an endtoken

Definition at line 119 of file DOMLex.php.

References $data, HTMLPurifier_Lexer::parseData(), and transformAttrToAssoc().

Referenced by tokenizeDOM().

HTMLPurifier_Lexer_DOMLex::createStartNode ( node,
&$  tokens,
collect 
) [protected]
Parameters:
$nodeDOMNode to be tokenized.
$tokensArray-list of already tokenized tokens.
$collectSays whether or start and close are collected, set to false at first recursion because it's the implicit DIV tag you're dealing with.
Returns:
bool if the token needs an endtoken

Definition at line 14799 of file HTMLPurifier.standalone.php.

References $data, HTMLPurifier_Lexer::parseData(), and transformAttrToAssoc().

HTMLPurifier_Lexer_DOMLex::muteErrorHandler ( errno,
errstr 
)

An error handler that mutes all errors.

Definition at line 14882 of file HTMLPurifier.standalone.php.

HTMLPurifier_Lexer_DOMLex::muteErrorHandler ( errno,
errstr 
)

An error handler that mutes all errors.

Definition at line 202 of file DOMLex.php.

HTMLPurifier_Lexer_DOMLex::tokenizeDOM ( node,
&$  tokens 
) [protected]

Iterative function that tokenizes a node, putting it into an accumulator.

To iterate is human, to recurse divine - L. Peter Deutsch

Parameters:
$nodeDOMNode to be tokenized.
$tokensArray-list of already tokenized tokens.
Returns:
Tokens of node appended to previously passed tokens.

Definition at line 14761 of file HTMLPurifier.standalone.php.

References createEndNode(), and createStartNode().

HTMLPurifier_Lexer_DOMLex::tokenizeDOM ( node,
&$  tokens 
) [protected]

Iterative function that tokenizes a node, putting it into an accumulator.

To iterate is human, to recurse divine - L. Peter Deutsch

Parameters:
$nodeDOMNode to be tokenized.
$tokensArray-list of already tokenized tokens.
Returns:
Tokens of node appended to previously passed tokens.

Definition at line 81 of file DOMLex.php.

References createEndNode(), and createStartNode().

Referenced by HTMLPurifier_Lexer_PH5P::tokenizeHTML(), and tokenizeHTML().

HTMLPurifier_Lexer_DOMLex::tokenizeHTML ( string,
config,
context 
)

Lexes an HTML string into tokens.

Parameters:
$stringString HTML.
Returns:
HTMLPurifier_Token array representation of HTML.

Reimplemented from HTMLPurifier_Lexer.

Reimplemented in HTMLPurifier_Lexer_PH5P.

Definition at line 38 of file DOMLex.php.

References $config, $html, HTMLPurifier_Lexer::normalize(), tokenizeDOM(), and wrapHTML().

HTMLPurifier_Lexer_DOMLex::tokenizeHTML ( string,
config,
context 
)

Lexes an HTML string into tokens.

Parameters:
$stringString HTML.
Returns:
HTMLPurifier_Token array representation of HTML.

Reimplemented from HTMLPurifier_Lexer.

Reimplemented in HTMLPurifier_Lexer_PH5P.

Definition at line 14718 of file HTMLPurifier.standalone.php.

References $config, $html, HTMLPurifier_Lexer::normalize(), tokenizeDOM(), and wrapHTML().

HTMLPurifier_Lexer_DOMLex::transformAttrToAssoc ( node_map) [protected]

Converts a DOMNamedNodeMap of DOMAttr objects into an assoc array.

Parameters:
$attribute_listDOMNamedNodeMap of DOMAttr objects.
Returns:
Associative array of attributes.

Definition at line 14867 of file HTMLPurifier.standalone.php.

HTMLPurifier_Lexer_DOMLex::transformAttrToAssoc ( node_map) [protected]

Converts a DOMNamedNodeMap of DOMAttr objects into an assoc array.

Parameters:
$attribute_listDOMNamedNodeMap of DOMAttr objects.
Returns:
Associative array of attributes.

Definition at line 187 of file DOMLex.php.

Referenced by createStartNode().

HTMLPurifier_Lexer_DOMLex::wrapHTML ( html,
config,
context 
) [protected]

Wraps an HTML fragment in the necessary HTML.

Definition at line 223 of file DOMLex.php.

References $config, and $def.

Referenced by HTMLPurifier_Lexer_PH5P::tokenizeHTML(), and tokenizeHTML().

HTMLPurifier_Lexer_DOMLex::wrapHTML ( html,
config,
context 
) [protected]

Wraps an HTML fragment in the necessary HTML.

Definition at line 14903 of file HTMLPurifier.standalone.php.

References $config, and $def.


Member Data Documentation

HTMLPurifier_Lexer_DOMLex::$factory [private]

Definition at line 30 of file DOMLex.php.


The documentation for this class was generated from the following files: