I have to process user-provided markup for a specific kind of embed, which is typically in the form of a <script>
tag, typically with a src
attribute. There are a variety of different <script>
components that can be used here, each one different. However, to avoid potential XSS
attacks, we've deemed it necessary to strip out anything inside the tag.
<script type="text/javascript" src="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js">document.write("vinny say something funny"); //This should be sanitized out</script>
DOMDocument really doesn't give us an easy way to alter the innerhtml, and I have seen a few approaches but none seem to address keeping attribute intact if the tag is destroyed. Am I missing something in implementing a best approach, or is there an easier way to go about addressing this?
This code removes child nodes from the <script>
node. In this case it's the document element:
<?php
$xml = '<script type="text/javascript" src="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js">document.write("vinny say something funny");</script>';
$doc = new DOMDocument();
$doc->loadXml($xml);
$scriptNode = $doc->documentElement;
while ($scriptNode->hasChildNodes()) {
$scriptNode->removeChild($scriptNode->lastChild);
}
echo $doc->saveXML();
Output is:
<?xml version="1.0"?>
<script type="text/javascript" src="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js"/>
As a simple method is to do a shallow clone of the node (using cloneNode()
) without the optional parameter.
This will go through the loaded document and replace each script node with the new content...
$html = '<script type="text/javascript" src="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js">document.write("vinny say something funny");</script>';
$doc = new DOMDocument();
$doc->loadHTML($html);
foreach ( $doc->getElementsByTagName("script") as $script ){
$script->parentNode->replaceChild($script->cloneNode(), $script);
}
echo $doc->saveHTML();
gives...
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><head><script type="text/javascript" src="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js"></script></head></html>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.