[英]Parsing html partial templates using lxml
I am trying to parse html templates using lxml and adding certain attributes to html elements. 我正在尝试使用lxml解析html模板,并将某些属性添加到html元素。 I can do so successfuly but when reading a document via
我可以成功完成,但是通过读取文档时
template = etree.parse(view, etree.HTMLParser(remove_comments=True))
and then saving the document. 然后保存文档。 I noticed that my templates contain additional markup to make it a valid html template.
我注意到我的模板包含其他标记,使其成为有效的html模板。 Wrapping a template like
包装像这样的模板
<div>
<span> A template </span>
</div>
with html and body tags, turning it into something like 带有html和body标签,将其变成类似
<html>
<body>
<div>
<span> A template </span>
</div>
</body>
<html>
How do I read my 'broken' html templates and prevent lxml from adding these additional tags? 如何阅读“损坏的” HTML模板,并防止lxml添加这些附加标签?
simply dont use html parser 根本不使用html解析器
with html parser: 使用html解析器:
>>> template = etree.fromstring('<div><span> A template </span></div>', etree.HTMLParser(remove_comments=True))
>>> etree.tostring(template)
'<html><body><div><span> A template </span></div></body></html>'
without it: 没有它:
>>> template = etree.fromstring('<div><span> A template </span></div>')
>>> etree.tostring(template)
'<div><span> A template </span></div>'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.