简体   繁体   English

忽略xml.etree.ElementTree.XMLParser Python中不匹配的标记

[英]Ignore mismatched tag in xml.etree.ElementTree.XMLParser Python

有什么办法可以忽略Python xml.etree.ElementTree.XMLParser中不匹配的标签?

If there are mismatched tags, then the input that you are processing is not XML by definition (since it is not well-formed). 如果存在不匹配的标记,那么根据定义,您正在处理的输入不是XML(因为它的格式不正确)。 There is no way to "ignore" mismatched tags with ElementTree. 无法通过ElementTree“忽略”不匹配的标签。


The XMLParser class in the lxml library has a recover constructor argument (see http://lxml.de/api/lxml.etree.XMLParser-class.html ). XMLParser的lxml的图书馆类有一个recover构造函数的参数(见http://lxml.de/api/lxml.etree.XMLParser-class.html )。 When recover=True , lxml will try to fix ill-formed input. recover=True ,lxml将尝试修复格式错误的输入。 Example: 例:

from lxml import etree

BADINPUT = """
<root> 
  <foo>ABC</bar> 
  <baz>DEF</baz> 
</root>"""

parser = etree.XMLParser(recover=True)
root = etree.fromstring(BADINPUT, parser)
print etree.tostring(root)

Output (the bad </bar> end tag has been changed to </foo> ): 输出(错误的</bar>结束标记已更改为</foo> ):

<root> 
  <foo>ABC</foo>
  <baz>DEF</baz> 
</root>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM