如何从字符串中删除 html 个元素，但使用正则表达式排除特定元素

Question

I have a string 'TEST1 TEST2 <a href="#">TEST3</a>'我有一个字符串'TEST1 TEST2 <a href="#">TEST3</a>'

I need to remove html tags and leave the text我需要删除 html 标签并留下文本

import re
p = re.compile( '\s*<[^>]+>\s*')
test = p.sub('', '<span>TEST1</span> <span>TEST2</span> <a href="#">TEST3</a>')
print(test)

OUTPUT: TEST1TEST2TEST3 TEST1TEST2TEST3 ：测试1测试2测试3

But this removes every html element, how should I change regex so that the output would be like this:但这会删除每个 html 元素，我应该如何更改正则表达式以使 output 像这样：

OUTPUT: TEST1 TEST2 <a href="#">TEST3</a>

Answer 1

You can work with the so-called " Negative Lookaheads ".您可以使用所谓的“ Negative Lookaheads ”。

In your case, you can leave out <a and </a> :在您的情况下，您可以省略<a和</a> ：

(??<a )(?!<\/a>)<[^>]+>

Note the space in <a and the closing parenthesis in </a> so that only the opening and closing tags of an <a> element match and nothing else begins with an a.请注意<a中的空格和</a>中的右括号，以便只有<a>元素的开始和结束标记匹配，而没有其他内容以 a 开头。

如何从字符串中删除 html 个元素，但使用正则表达式排除特定元素

问题描述

1 个解决方案

解决方案1
2 已采纳 2022-04-27 10:45:34

如何从字符串中删除 html 个元素，但使用正则表达式排除特定元素

问题描述

1 个解决方案

解决方案1 2 已采纳 2022-04-27 10:45:34

解决方案1
2 已采纳 2022-04-27 10:45:34