简体   繁体   English

删除脚本中的 HTML 标签

[英]Remove HTML tags in script

I've found this piece of code on the internet.我在互联网上找到了这段代码。 It takes a sentence and makes every single word into link with this word.它需要一个句子,并将每个单词与这个单词联系起来。 But it has weak side: if a sentence has HTML in it, this script doesn't remove it.但它有弱点:如果一个句子中有 HTML,这个脚本不会删除它。

For example: it replaces ' <b>asserted</b> ' with ' http://www.merriam-webster.com/dictionary/<b>asserted</b> '例如:它将“ <b>asserted</b> ”替换为“ http://www.merriam-webster.com/dictionary/<b>asserted</b>

Could you please tell me what to change in this code for it to change ' <b>asserted</b> ' to ' http://www.merriam-webster.com/dictionary/asserted '.您能否告诉我在此代码中进行哪些更改以将“ <b>asserted</b> ”更改为“ http://www.merriam-webster.com/dictionary/asserted ”。

var content = document.getElementById("sentence").innerHTML;

var punctuationless = content.replace(/[.,\/#!$%\؟^?&\*;:{}=\-_`~()”“"]/g, "");
var mixedCase = punctuationless.replace(/\s{2,}/g);
var finalString = mixedCase.toLowerCase();

var words = (finalString).split(" ");

var punctuatedWords = (content).split(" ");

var processed = "";
for (i = 0; i < words.length; i++) {
    processed += "<a href = \"http://www.merriam-webster.com/dictionary/" + words[i] + "\">";
    processed += punctuatedWords[i];
    processed += "</a> ";
}

document.getElementById("sentence").innerHTML = processed;

This regex /<{1}[^<>]{1,}>{1}/g should replace any text in a string that is between two of these <> and the brackets themselves with a white space.此正则表达式 /<{1}[^<>]{1,}>{1}/g 应该用空格替换其中两个 <> 和括号本身之间的字符串中的任何文本。 This

 var str = "<hi>How are you<hi><table><tr>I<tr><table>love cake<g>" str = str.replace(/<{1}[^<>]{1,}>{1}/g," ") document.writeln(str);

will give back " How are you I love cake".会回馈“你好吗,我喜欢蛋糕”。

If you paste this如果你粘贴这个

var stripHTML = str.mixedCase(/<{1}[^<>]{1,}>{1}/g,"")

just below this就在这下面

var mixedCase = punctuationless.replace(/\s{2,}/g);

and replace mixedCase with stripHTML in the line after, it will probably work并在后面的行中用 stripHTML 替换混合大小写,它可能会起作用

function stripAllHtml(str) {
  if (!str || !str.length) return ''

  str = str.replace(/<script.*?>.*?<\/script>/igm, '')

  let tmp = document.createElement("DIV");
  tmp.innerHTML = str;

  return tmp.textContent || tmp.innerText || "";
}

stripAllHtml('<a>test</a>')

This function will strip all the HTML and return only text.此函数将删除所有 HTML 并仅返回文本。

Hopefully, this will work for you希望这对你有用

if you need to remove HTML tags And HTML Entities You can use如果您需要删除 HTML 标签和 HTML 实体,您可以使用

const text = '<p>test content </p><p><strong>test bold</strong>&nbsp;</p>'
text.replace(/<[^>]*(>|$)|&nbsp;|&zwnj;|&raquo;|&laquo;|&gt;/g, '');

the result will be "test content test bold"结果将是“测试内容测试粗体”

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM