简体   繁体   中英

Remove HTML tags in script

I've found this piece of code on the internet. It takes a sentence and makes every single word into link with this word. But it has weak side: if a sentence has HTML in it, this script doesn't remove it.

For example: it replaces ' <b>asserted</b> ' with ' http://www.merriam-webster.com/dictionary/<b>asserted</b> '

Could you please tell me what to change in this code for it to change ' <b>asserted</b> ' to ' http://www.merriam-webster.com/dictionary/asserted '.

var content = document.getElementById("sentence").innerHTML;

var punctuationless = content.replace(/[.,\/#!$%\؟^?&\*;:{}=\-_`~()”“"]/g, "");
var mixedCase = punctuationless.replace(/\s{2,}/g);
var finalString = mixedCase.toLowerCase();

var words = (finalString).split(" ");

var punctuatedWords = (content).split(" ");

var processed = "";
for (i = 0; i < words.length; i++) {
    processed += "<a href = \"http://www.merriam-webster.com/dictionary/" + words[i] + "\">";
    processed += punctuatedWords[i];
    processed += "</a> ";
}

document.getElementById("sentence").innerHTML = processed;

This regex /<{1}[^<>]{1,}>{1}/g should replace any text in a string that is between two of these <> and the brackets themselves with a white space. This

 var str = "<hi>How are you<hi><table><tr>I<tr><table>love cake<g>" str = str.replace(/<{1}[^<>]{1,}>{1}/g," ") document.writeln(str);

will give back " How are you I love cake".

If you paste this

var stripHTML = str.mixedCase(/<{1}[^<>]{1,}>{1}/g,"")

just below this

var mixedCase = punctuationless.replace(/\s{2,}/g);

and replace mixedCase with stripHTML in the line after, it will probably work

function stripAllHtml(str) {
  if (!str || !str.length) return ''

  str = str.replace(/<script.*?>.*?<\/script>/igm, '')

  let tmp = document.createElement("DIV");
  tmp.innerHTML = str;

  return tmp.textContent || tmp.innerText || "";
}

stripAllHtml('<a>test</a>')

This function will strip all the HTML and return only text.

Hopefully, this will work for you

if you need to remove HTML tags And HTML Entities You can use

const text = '<p>test content </p><p><strong>test bold</strong>&nbsp;</p>'
text.replace(/<[^>]*(>|$)|&nbsp;|&zwnj;|&raquo;|&laquo;|&gt;/g, '');

the result will be "test content test bold"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM