使用正则表达式删除 html 标签

Question

Im trying to get rid of the HTML tags, to an extent it works, but not all the tags are removed.我试图摆脱 HTML 标签，在某种程度上它有效，但并非所有标签都被删除。 But the below mentioned tags aren't gone但是下面提到的标签没有消失

print('NOT DEALT WITH:')
for body in not_dealt_with_list:
#p = re.compile(r'<.*?[\\t\\n\\r\\s]*?.*?>')
    print(remove_tags(body))
    #print(p.sub('', body))
    #body = re.sub()

def remove_tags(content):
parser = lxml.html.HTMLParser(remove_comments=True, 
remove_blank_text=True)
document = lxml.html.document_fromstring(content, parser)
return document.text_content()

Answer 1

it looks like what you're trying to remove is embedded into a html comment (because it doesn't look like html there).看起来您要删除的内容已嵌入到 html 注释中（因为那里看起来不像 html）。 Html comments start with and that's what you have to search for. Html 注释开头，这就是您必须搜索的内容。

Try this regex to search for everything inside a comment to replace it afterwards over multiple lines尝试使用此正则表达式搜索注释中的所有内容，然后在多行中替换它

<!--(.|\n)*?-->

Let me know how it works out!让我知道它是如何工作的！

使用正则表达式删除 html 标签

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-07-17 08:28:33

使用正则表达式删除 html 标签

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-07-17 08:28:33

解决方案1
1 已采纳 2019-07-17 08:28:33