除了唯一的“s”标签外，如何删除字符串中的所有 HTML 标签？

Question

我正在尝试删除所有 html 标签，仅<s></s>标签除外。 现在我有：

contents.replace(/(<([^>]+)>)/gi, '')

这将删除所有 html 标签。

所以...

我尝试了许多其他解决方案。

<\/?(?!s)\w*\b[^>]*> 。 <(?.s|/s)?*?> .....

但是，这些正则表达式会删除所有包含字母“s”的标签。

例如， <strong> <span>等。

如果你能帮助我，我将不胜感激。

Answer 1

这是否可能取决于您想要的准确性。 正则表达式不能用于 100% 准确解析 HTML。

但如果你只是想要快速而肮脏的东西：

您可以利用String.prototype.replace允许您区分捕获组的事实： https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace#specifying_a_function_as_the_replacement

所以你可以制作两个捕获组：

第 1 组（ <s>或</s> ）： <\/?s>

第 2 组：（ “以<开头，以>结尾，中间没有> ” ）： (<[^>]*>)

然后在调用string.replace时，如果它匹配第 1 组则返回匹配项，否则它只匹配第 2 组，因此返回一个空字符串：

 function removeTags(text) { const regex = /(<\/?s>)|(<[^>]*>)/g; // Group 1 OR Group 2 return text.replace(regex, (_, g1) => g1 || ''); } let text = '<span>Span Text <s>S Text <strong>Strong Text</strong></s></span>'; console.log(removeTags(text));

请注意缺陷：如果<和>以文本形式存在，则介于两者之间的所有内容都可能被视为标记，而实际上不是：

 function removeTags(text) { const regex = /(<\/?s>)|(<[^>]*>)/g; // Group 1 OR Group 2 return text.replace(regex, (_, g1) => g1 || ''); } let text = '<p> This is how you start a tag: `<` and this is how you end a tag: `>`</p>'; console.log("But the regex fails:"); console.log(removeTags(text));

 XML parsers can see that the brackets do not create a tag: <p> This is how you start a tag: `<` and this is how you end a tag: `>`</p>

如果您想要准确的解析，请使用 XML 解析器。

Answer 2

您可以尝试： /(<([^>s]+)>)|(<\/?(\w{2,})>)/gmi

第一部分(<([^>s]+)>)将捕获所有 html 个标签，除了包含字母s标签。

第二部分(<\/?(\w{2,})>)将捕获所有 html 具有 2 个或更多字母的标签。

演示： https://regex101.com/r/AFlXam/1

Answer 3

您无法使用正则表达式可靠地解析 HTML，请参阅除 XHTML 自包含标签外的正则表达式匹配开放标签

您可以使用具有一定限制的正则表达式来解决除s标签之外的剥离 HTML 的问题。 这建立在 Chris Hamilton 的回答之上，但避免了错误（ a <= 20 && a > 2 ），因为它知道标签和属性：

 function removeTags(text) { const regex = /(<\/?s>)|<\/?[a-zA-Z][a-zA-Z0-9]*(?: .*?)?>/g; return text.replace(regex, (_, g1) => g1 || ''); } const text = '<h1>Demo:</h1> <p>Paragraph with <s>S text</s>, <b>bold stuff.</b></p> <p style="color: gray">Condition: <tt>(a <= 20 && a > 2)</tt></p>'; console.log(removeTags(text));

Output：

Demo: Paragraph with <s>S text</s>, bold stuff. Condition: (a <= 20 && a > 2)

正则解释：

(<\/?s>) -- 文字<s>或</s>
| -- 逻辑或
<\/?[a-zA-Z][a-zA-Z0-9]* -- 标签的开始，例如<h1或 '<p'
(?: .*?)? -- 可选的以空格开头的非捕获组，以及非贪婪扫描
> -- 文字>

了解有关正则表达式的更多信息： https://twiki.org/cgi-bin/view/Codev/TWikiPresentation2018x10x14Regex

除了唯一的“s”标签外，如何删除字符串中的所有 HTML 标签？

问题描述

3 个解决方案

解决方案1
1 已采纳 2023-01-27 06:27:38

解决方案2
0 2023-01-27 09:48:42

解决方案3
0 2023-01-27 20:57:31

除了唯一的“s”标签外，如何删除字符串中的所有 HTML 标签？

问题描述

3 个解决方案

解决方案1 1 已采纳 2023-01-27 06:27:38

解决方案2 0 2023-01-27 09:48:42

解决方案3 0 2023-01-27 20:57:31

解决方案1
1 已采纳 2023-01-27 06:27:38

解决方案2
0 2023-01-27 09:48:42

解决方案3
0 2023-01-27 20:57:31