Welcome! » Log In » Create A New Profile

Missing $node->tagName

Posted by mckelvey 
Missing $node->tagName
September 08, 2017 03:40PM

Hiya!

I was working with an instance of Craft CMS which incorporates HTML Purifier (standalone) as part of it’s save validation on rich text and was running into an issue where $node->tagName did not exist in `createStartNode`. Given the PHP docs on DOMElement and the code therein (v. 4.9.3 btw) this seemed impossible but obviously wasn’t as I later found the little @todo in the function docblock.

While I was seeing this on one instance, I was not on another. The best possibility I could garner is that the server in question was running an older version of libxml (2.7.6) versus the successful instance libxml version (2.9.1 and 2.9.2). Additionally, the newer libxml libs were paired with PHP 5 versus the older libxml paired with PHP 7.

At any rate, swapping out the libxml to truly test this was not a timely option, so I handled the issue myself and have included the result. I also had to deal with a lack of $node->data for DOMText.

https://gist.github.com/mckelvey/3820ff4a1052325d032f85d24c2363b1

    /* replaces lines 18985-19057 of HTMLPurifier.standalone.php v.4.9.3 */
    
    /**
     * @param DOMNode $node
     */
    protected function getTagName($node)
    {
        if (property_exists($node, 'tagName')) {
            return $node->tagName;
        } else if (property_exists($node, 'nodeName')) {
            return $node->nodeName;
        } else if (property_exists($node, 'localName')) {
            return $node->localName;
        }
        return null;
    }

    /**
     * @param DOMNode $node
     */
    protected function getData($node)
    {
        if (property_exists($node, 'data')) {
            return $node->data;
        } else if (property_exists($node, 'nodeValue')) {
            return $node->nodeValue;
        } else if (property_exists($node, 'textContent')) {
            return $node->textContent;
        }
        return null;
    }


    /**
     * @param DOMNode $node DOMNode to be tokenized.
     * @param HTMLPurifier_Token[] $tokens   Array-list of already tokenized tokens.
     * @param bool $collect  Says whether or start and close are collected, set to
     *                    false at first recursion because it's the implicit DIV
     *                    tag you're dealing with.
     * @return bool if the token needs an endtoken
     * @todo data and tagName properties don't seem to exist in DOMNode?
     */
    protected function createStartNode($node, &$tokens, $collect, $config)
    {
        // intercept non element nodes. WE MUST catch all of them,
        // but we're not getting the character reference nodes because
        // those should have been preprocessed
        if ($node->nodeType === XML_TEXT_NODE) {
            $data = $this->getData($node); // Handle variable data property
            if ($data !== null) {
              $tokens[] = $this->factory->createText($data);
            }
            return false;
        } elseif ($node->nodeType === XML_CDATA_SECTION_NODE) {
            // undo libxml&#039;s special treatment of <script> and <style> tags
            $last = end($tokens);
            $data = $node->data;
            // (note $node->tagname is already normalized)
            if ($last instanceof HTMLPurifier_Token_Start && ($last->name == &#039;script&#039; || $last->name == &#039;style&#039;)) {
                $new_data = trim($data);
                if (substr($new_data, 0, 4) === &#039;<!--&#039;) {
                    $data = substr($new_data, 4);
                    if (substr($data, -3) === &#039;-->&#039;) {
                        $data = substr($data, 0, -3);
                    } else {
                        // Highly suspicious! Not sure what to do...
                    }
                }
            }
            $tokens[] = $this->factory->createText($this->parseText($data, $config));
            return false;
        } elseif ($node->nodeType === XML_COMMENT_NODE) {
            // this is code is only invoked for comments in script/style in versions
            // of libxml pre-2.6.28 (regular comments, of course, are still
            // handled regularly)
            $tokens[] = $this->factory->createComment($node->data);
            return false;
        } elseif ($node->nodeType !== XML_ELEMENT_NODE) {
            // not-well tested: there may be other nodes we have to grab
            return false;
        }
        $attr = $node->hasAttributes() ? $this->transformAttrToAssoc($node->attributes) : array();
        $tag_name = $this->getTagName($node); // Handle variable tagName property
        if (empty($tag_name)) {
            return (bool) $node->childNodes->length;
        }
        // We still have to make sure that the element actually IS empty
        if (!$node->childNodes->length) {
            if ($collect) {
                $tokens[] = $this->factory->createEmpty($tag_name, $attr);
            }
            return false;
        } else {
            if ($collect) {
                $tokens[] = $this->factory->createStart($tag_name, $attr);
            }
            return true;
        }
    }
    
    /**
     * @param DOMNode $node
     * @param HTMLPurifier_Token[] $tokens
     */
    protected function createEndNode($node, &$tokens)
    {
        $tag_name = $this->getTagName($node); // Handle variable tagName property
        $tokens[] = $this->factory->createEnd($tag_name);
    }
Re: Missing $node->tagName
September 08, 2017 09:57PM

Thanks. Would you mind opening a GitHub PR with your change?

Re: Missing $node->tagName
September 09, 2017 12:36AM

Happy to. I hadn’t seen the repo, but just found it and will issue a PR.

Thanks!

David

Per
Re: Missing $node->tagName
December 15, 2017 02:45AM

Just ran into this problem myself (Craft CMS 2.6.3000) and got it working with the suggested solution from this thread. My problem only appeared in one specific environment (production of course :/ ) running PHP 7.1.12 an libxml 2.9.4 whereas dev-environment runs PHP 7.0.22 and libxml 2.8.0 and works fine.

Not sure if the PR was ever submitted but it would be great if you could get that solution into the next version.

Thanks,

Per

Re: Missing $node->tagName
December 22, 2017 09:52PM
Sorry, you do not have permission to post/reply in this forum.