I have a form where users can enter descriptions, using TinyMCE for styling. Because of this, my users have the ability to insert HTML. I am already stripping almost all HTML elements using strip_tags
, but users can still input malicious values, such as this one:
<strong onclick="window.location='http://example.com'">Evil</strong>
I would like to prevent users from being able to do this, by stripping all attributes from all tags, except for the style
attribute.
I can only find solutions to strip either all attributes, or strip only the specified ones. I would like to keep only the style
attribute.
I have tried DOMDocument, but it seems to add DOCTYPE
and html
tags on its own, outputting it as an entire HTML document. Additionally, it sometimes seems to randomly add HTML entities such as upside-down question marks.
Here's my DOMDocument implementation:
//Example "evil" input
$description = "<p><strong onclick=\"alert('evil');\">Evil</strong></p>";
//Strip all tags from description except these
$description = strip_tags($description, '<p><br><a><b><i><u><strong><em><span><sup><sub>');
//Strip attributes from tags (to prevent inline Javascript)
$dom = new DOMDocument();
$dom->loadHTML($description);
foreach($dom->getElementsByTagName('*') as $element)
{
//Attributes cannot be removed directly because DOMNamedNodeMap implements Traversable incorrectly
//Atributes are first saved to an array and then looped over later
$attributes_to_remove = array();
foreach($element->attributes as $name => $value)
{
if($name != 'style')
{
$attributes_to_remove[] = $name;
}
}
//Loop over saved attributes and remove them
foreach($attributes_to_remove as $attribute)
{
$element->removeAttribute($attribute);
}
}
echo $dom->saveHTML();
Here are two options for DOMDocument::loadHtml() that will solve the problem.
$dom->loadHTML($description, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
But they are only available in libxml >= 2.7.8. If you have an older version you can try a different approach:
If you know that you expect a fragment you can use that and save only the children of the body
element.
$description = <<<'HTML'
<strong onclick="alert('evil');" style="text-align:center;">Evil</strong>
HTML;
$dom = new DOMDocument();
$dom->loadHTML($description);
foreach($dom->getElementsByTagName('*') as $element) {
$attributes_to_remove = iterator_to_array($element->attributes);
unset($attributes_to_remove['style']);
foreach($attributes_to_remove as $attribute => $value) {
$element->removeAttribute($attribute);
}
}
foreach ($dom->getElementsByTagName('body')->item(0)->childNodes as $node) {
echo $dom->saveHTML($node);
}
Output:
<strong style="text-align:center;">Evil</strong>
I don't know if this is more or less what you mean to do...
$description = "<p><strong onclick=\"alert('evil');\">Evil</strong></p>";
$description = strip_tags( $description, '<p><br><a><b><i><u><strong><em><span><sup><sub>' );
$dom=new DOMDocument;
$dom->loadHTML( $description );
$tags=$dom->getElementsByTagName('*');
foreach( $tags as $tag ){
if( $tag->hasAttributes() ){
$attributes=$tag->attributes;
foreach( $attributes as $name => $attrib ) $tag->removeAttribute( $name );
}
}
echo $dom->saveHTML();
/* Will echo out `Evil` in bold but without the `onclick` */
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.