忽略xml.etree.ElementTree.XMLParser Python中不匹配的标记

Question

有什么办法可以忽略Python xml.etree.ElementTree.XMLParser中不匹配的标签？

Answer 1

If there are mismatched tags, then the input that you are processing is not XML by definition (since it is not well-formed). 如果存在不匹配的标记，那么根据定义，您正在处理的输入不是XML（因为它的格式不正确）。 There is no way to "ignore" mismatched tags with ElementTree. 无法通过ElementTree“忽略”不匹配的标签。

The XMLParser class in the lxml library has a recover constructor argument (see http://lxml.de/api/lxml.etree.XMLParser-class.html ). 该XMLParser的lxml的图书馆类有一个recover构造函数的参数（见http://lxml.de/api/lxml.etree.XMLParser-class.html ）。 When recover=True , lxml will try to fix ill-formed input. 当recover=True ，lxml将尝试修复格式错误的输入。 Example: 例：

from lxml import etree

BADINPUT = """
<root> 
  <foo>ABC</bar> 
  <baz>DEF</baz> 
</root>"""

parser = etree.XMLParser(recover=True)
root = etree.fromstring(BADINPUT, parser)
print etree.tostring(root)

Output (the bad </bar> end tag has been changed to </foo> ): 输出（错误的</bar>结束标记已更改为</foo> ）：

<root> 
  <foo>ABC</foo>
  <baz>DEF</baz> 
</root>

忽略xml.etree.ElementTree.XMLParser Python中不匹配的标记

问题描述

1 个解决方案

解决方案1
3 2016-10-30 17:14:50

忽略xml.etree.ElementTree.XMLParser Python中不匹配的标记

问题描述

1 个解决方案

解决方案1 3 2016-10-30 17:14:50

解决方案1
3 2016-10-30 17:14:50