简体   繁体   中英

JQuery word count and stripping tags from HTML

I am trying to do a word-count of a textarea that accepts HTML input.

My first step is to strip tags from the input. I have found this code from another question :

$("<div></div>").html(html).text();

Which works great, but is vulnerable to script tags in the html:

html = "<script>alert()";

I am trying to mitigate this by using:

$("<p>").html(html).remove('script').text();

Which successfully handles the example above. Unfortunately it doesn't handle:

html = "<script><script>alert();</script>";

As it only removes the outer script.

I'm trying to write a while loop to continually remove scripts until there are none left to remove, but I'm struggling with the logic.

I want something like this:

var $div = $("<div></div>").html(html);
while(*remove script causes a change*){
  $div = $div.remove('script');
}
text = $div.text();

Is this possible? And is this safe?

Is there any way to handle onXXX="" attributes in other elements too?

You can use this regular expression:

var regex = /(<([^>]+)>)/ig
var body = "<p>test</p>"
var result = body.replace(regex, "");

alert(result);

Found an another answer on StackOverflow: How to strip HTML tags from div content using Javascript/jQuery?

Please sanitize the string before saving into the database.

I settled on using the phpjs version of the php function strip_tags , which appears to be working nicely and handling script tags well.

My simplistic word count function so far is:

$('#input').on('input',function(){
    var text = $(this).val();
    text = strip_tags(text).replace(/\s+/g, ' ').trim();
    var wordCount = 0;
    if(text != ''){
      var words = text.split(' ');
      wordCount = words.length;
    }

    $('#word-count').html(wordCount);
});

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM