DOM Based XSS Attack and InnerHTML

Question

How would one go about securing the below DOM Based XSS attack?

Specifically, is there a protect() function that will make the below safe? If no, then is there another solution? eg: Giving the div an id and then later assigning the element an onclick handler

<?php
function protect()
{
   // For non-DOM XSS attacks, hex-encoding all non-alphanumeric characters
   // with ASCII values less than 256 works (ie: \xHH)
   // But is it possible to augment this function to protect against
   // the below DOM based XSS attack?
}
?>

<body>
  <div id="mydiv"></div>
  <script type="text/javascript">
    var xss = "<?php echo protect($_GET["xss"]) ?>";
    $("#mydiv").html("<div onclick='myfunc(\""+xss+"\")'></div>")
  </script>
</body>

I'm hoping for an answer that is not "avoid using innerHTML" or "regex the xss variable to [a-zA-Z0-9]"...ie: is there a more general solution?

Thanks

Answer 1

Expanding on Vineet's reply, here's a set of test-cases to look into:

http://ha.ckers.org/xss.html

Answer 2

I've been playing around with PHP's DOMDocument and related classes with a view to writing a HTML parser that can deal with stuff like this. It's at a very early stage of development at the moment and is nowhere near ready for actual use, but my early experiments seem to show some promise for the idea.

Basically, you load your Markup into a DOMDocument, then traverse the tree. For each node in the tree you check what the node type is against a list of allowed node types. If the node type isn't in the list then it's removed from the tree.

You could use an approach similar to this to locate all SCRIPT tags in a piece of markup and remove them. DOM based XSS is rendered toothless if you can pull any embedded scripts out of the markup you've been provided.

This is the code I'm using, along with a test case that processes the StackOverflow home page. Like I said, it's far from production quality code and is little more than a proof of concept. Still, I hope you find it useful.

<?php
class HtmlClean
{
    private $whiteList      = array (
        '#cdata-section', '#comment', '#text', 'a', 'abbr', 'acronym', 'address', 'b', 
        'big', 'blockquote', 'body', 'br', 'caption', 'cite', 'code', 'col', 'colgroup', 
        'dd', 'del', 'dfn', 'div', 'dl', 'dt', 'em', 'fieldset', 'h1', 'h2', 'h3', 'h4', 
        'h5', 'h6', 'head', 'hr', 'html', 'i', 'img', 'ins', 'kbd', 'li', 'link', 'meta', 
        'ol', 'p', 'pre', 'q', 'samp', 'small', 'span', 'strike', 'strong', 'style', 'sub', 
        'sup', 'table', 'tbody', 'td', 'tfoot', 'th', 'thead', 'title', 'tr', 'tt', 'ul', 
        'var'
    );

    private $attrWhiteList  = array (
        'class', 'id', 'title'
    );

    private $dom            = NULL;

    /**
     * Get current tag whitelist
     * @return array
     */
    public function getWhiteListTags ()
    {
        $this -> whiteList  = array_values ($this -> whiteList);
        return ($this -> whiteList);
    }

    /**
     * Add tag to the whitelist
     * @param string $tagName
     */
    public function addWhiteListTag ($tagName)
    {
        $tagName    = strtolower (trin ($tagName));
        if (!in_array ($tagName, $this -> whiteList))
        {
            $this -> whiteList []   = $tagName;
        }
    }

    /**
     * Remove a tag from the whitelist
     * @param string $tagName
     */
    public function removeWhiteListTag ($tagName)
    {
        if ($index = array_search ($tagName, $this -> whiteList))
        {
            unset ($this -> whiteList [$index]);
        }
    }

    /**
     * Load document markup into the class for cleaning
     * @param string $html The markup to clean
     * @return bool
     */
    public function loadHTML ($html)
    {
        if (!$this -> dom)
        {
            $this -> dom    = new DOMDocument();
        }
        $this -> dom -> preserveWhiteSpace  = false;
        $this -> dom -> formatOutput        = true;
        return $this -> dom -> loadHTML ($html);
    }

    public function outputHtml ()
    {
        $ret    = '';
        if ($this -> dom)
        {
            $ret    = $this -> dom -> saveXML ();
        }
        return ($ret);
    }

    private function cleanAttrs (DOMnode $elem)
    {
        $attrs  = $elem -> attributes;
        $index  = $attrs -> length;
        while (--$index >= 0)
        {
            $attrName   = strtolower ($attrs -> item ($indes) -> name);
            if (!in_array ($attrName, $this -> attrWhiteList))
            {
                $elem -> removeAttribute ($attrName);
            }
        }       
    }

    /**
     * Recursivly remove elements from the DOM that aren't whitelisted
     * @param DOMNode $elem
     * @return array List of elements removed from the DOM
     * @throws Exception If removal of a node failed than an exception is thrown
     */
    private function cleanNodes (DOMNode $elem)
    {
        $removed    = array ();
        if (in_array (strtolower ($elem -> nodeName), $this -> whiteList))
        {
            // Remove non-whitelisted attributes
            if ($elem -> hasAttributes ())
            {
                $this -> cleanAttrs ($elem);
            }
            /*
             * Iterate over the element's children. The reason we go backwards is because
             * going forwards will cause indexes to change when elements get removed
             */
            if ($elem -> hasChildNodes ())
            {
                $children   = $elem -> childNodes;
                $index      = $children -> length;
                while (--$index >= 0)
                {
                    $removed = array_merge ($removed, $this -> cleanNodes ($children -> item ($index)));
                }
            }
        }
        else
        {
            // The element is not on the whitelist, so remove it
            if ($elem -> parentNode -> removeChild ($elem))
            {
                $removed [] = $elem;
            }
            else
            {
                throw new Exception ('Failed to remove node from DOM');
            }
        }
        return ($removed);
    }

    /**
     * Perform the cleaning of the document
     */
    public function clean ()
    {
        $removed    = $this -> cleanNodes ($this -> dom -> getElementsByTagName ('html') -> item (0));
        return ($removed);
    }
}

$test       = file_get_contents( ('http://www.stackoverflow.com/'));
// Windows-stype linebreaks really foul up the works. There's probably a better fix for this
$test       = str_replace (chr (13), '', $test);

$cleaner    = new HtmlClean ();
$cleaner -> loadHTML ($test);

echo ('<h1>Before</h1><pre>' . htmlspecialchars ($cleaner -> outputHtml ()) . '</pre>');

$start      = microtime (true);
$removed    = $cleaner -> clean ();
$cleanTime  = microtime (true) - $start;

echo ('<h1>Removed tag list</h1>');
foreach ($removed as $elem)
{
    var_dump ($elem -> nodeName);
}

echo ('<h1>After</h1><pre>' . htmlspecialchars ($cleaner -> outputHtml ()) . '</pre>');

// benchmark
var_dump ($cleanTime);
?>

Answer 3

I'm no PHP expert, but if you want to prevent XSS attacks against the code sample presented, in the current format, with minimum changes, you could use the PHP edition of OWASP ESAPI . To be specific, use the JavaScript codec class from ESAPI to protect the contents of the xss variable, as it appears in a JavaScript context.

DOM Based XSS Attack and InnerHTML

Question

3 answers

solution1
2 2011-05-17 03:26:50

solution2
2 2011-07-07 12:18:12

solution3
0 2011-05-17 02:25:20

DOM Based XSS Attack and InnerHTML

Question

3 answers

solution1 2 2011-05-17 03:26:50

solution2 2 2011-07-07 12:18:12

solution3 0 2011-05-17 02:25:20

solution1
2 2011-05-17 03:26:50

solution2
2 2011-07-07 12:18:12

solution3
0 2011-05-17 02:25:20