使用lxml解析html部分模板

Question

I am trying to parse html templates using lxml and adding certain attributes to html elements. 我正在尝试使用lxml解析html模板，并将某些属性添加到html元素。 I can do so successfuly but when reading a document via 我可以成功完成，但是通过读取文档时

 template = etree.parse(view, etree.HTMLParser(remove_comments=True))

and then saving the document. 然后保存文档。 I noticed that my templates contain additional markup to make it a valid html template. 我注意到我的模板包含其他标记，使其成为有效的html模板。 Wrapping a template like 包装像这样的模板

 <div>
   <span> A template </span>
 </div>

with html and body tags, turning it into something like 带有html和body标签，将其变成类似

 <html>
 <body>
 <div>
   <span> A template </span>
 </div>
 </body>
 <html>

How do I read my 'broken' html templates and prevent lxml from adding these additional tags? 如何阅读“损坏的” HTML模板，并防止lxml添加这些附加标签？

Answer 1

simply dont use html parser 根本不使用html解析器

with html parser: 使用html解析器：

>>> template = etree.fromstring('<div><span> A template </span></div>', etree.HTMLParser(remove_comments=True))
>>> etree.tostring(template)
'<html><body><div><span> A template </span></div></body></html>'

without it: 没有它：

>>> template = etree.fromstring('<div><span> A template </span></div>')
>>> etree.tostring(template)
'<div><span> A template </span></div>'

使用lxml解析html部分模板

问题描述

1 个解决方案

解决方案1
1 已采纳 2014-01-31 05:31:49

使用lxml解析html部分模板

问题描述

1 个解决方案

解决方案1 1 已采纳 2014-01-31 05:31:49

解决方案1
1 已采纳 2014-01-31 05:31:49