I have written some code that takes a string of html and cleans away any ugly HTML from it using jQuery (see an early prototype in this SO question ). It works pretty well, but I stumbled on an issue:
When using .append() to wrap the html in a div, all script elements in the code are evaluated and run (see this SO answer for an explanation why this happens). I don't want this, I really just want them to be removed, but I can handle that later myself as long as they are not run.
I am using this code:
var wrapper = $('<div/>').append($(html));
I tried to do it this way instead:
var wrapper = $('<div>' + html + '</div>');
But that just brings forth the "Access denied" error in IE that the append() function fixes (see the answer I referenced above).
I think I might be able to rewrite my code to not require a wrapper around the html, but I am not sure, and I'd like to know if it is possible to append html without running scripts in it, anyway.
How do I wrap a piece of unknown html without running scripts inside it, preferably removing them altogether?
Should I throw jQuery out the window and do this with plain JavaScript and DOM manipulation instead? Would that help?
I am not trying to put some kind of security layer on the client side. I am very much aware that it would be pointless.
James suggested that I should filter out the script elements, but look at these two examples (the original first and the James' suggestion):
jQuery("<p/>").append("<br/>hello<script type='text/javascript'>console.log('gnu!'); </script>there")
keeps the text nodes but writes gnu!
jQuery("<p/>").append(jQuery("<br/>hello<script type='text/javascript'>console.log('gnu!'); </script>there").not('script'))`
Doesn't write gnu!, but also loses the text nodes.
James has updated his answer and I have accepted it. See my latest comment to his answer, though.
How about removing the scripts first?
var wrapper = $('<div/>').append($(html).not('script'));
Assuming script elements in the html are not nested in other elements:
var wrapper = document.createElement('div'); wrapper.innerHTML = html; $(wrapper).children().remove('script');
var wrapper = document.createElement('div');
wrapper.innerHTML = html;
$(wrapper).find('script').remove();
This works for the case where html is just text and where html has text outside any elements.
Below is an alternative way to prevent scripts within a loaded html from running:
function preventJS(html) {
return html.replace(/<script(?=(\s|>))/i, '<script type="text/xml" ');
}
In details it's described here - JavaScript: How to prevent execution of JavaScript within a html being added to the DOM . Probably, this solution will be useful for somebody.
You should remove the
script
elements:
var wrapper = $('<div/>').append($(html).remove("script"));
Second attempt:
node-validator can be used in the browser: https://github.com/chriso/node-validator
var str = sanitize(large_input_str).xss();
Alternatively, PHPJS has a strip_tags function (regex/evil based): http://phpjs.org/functions/strip_tags:535
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.