[英]How to remove links from HTML completely with Bleach?
You should use lxml
.你应该使用lxml
。 Bleach is simply for cleaning data & ensuring security/safety in the markup you store. Bleach 仅用于清理数据并确保您存储的标记的安全性。
You can use lxml
to parse structured data like HTML or XML.您可以使用lxml
来解析结构化数据,例如 HTML 或 XML。
Consider a simple html file;考虑一个简单的 html 文件;
<html>
<body>
<p>Hello, World!</p>
</body>
</html>
from lxml import html
root = html.parse("hello_world.html").getroot()
print(html.tostring(root))
# <html><body><p>Hello, World!</p></body></html>
p = root.find("body/p")
p.drop_tree()
print(html.tostring(root))
# <html><body></body></html>
On a related note, if you want to look into some more advanced parsing with lxml
, one of my oldest questions on here was around getting python to parse xml & write python code out of it.在相关说明中,如果您想使用lxml
研究一些更高级的解析,我在这里最古老的问题之一是让 python 解析 xml 并从中编写 python 代码。 Writing a Python tool to convert XML to Python? 编写一个 Python 工具将 XML 转换为 Python?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.