[英]Ignore mismatched tag in xml.etree.ElementTree.XMLParser Python
有什么办法可以忽略Python xml.etree.ElementTree.XMLParser中不匹配的标签?
If there are mismatched tags, then the input that you are processing is not XML by definition (since it is not well-formed). 如果存在不匹配的标记,那么根据定义,您正在处理的输入不是XML(因为它的格式不正确)。 There is no way to "ignore" mismatched tags with ElementTree.
无法通过ElementTree“忽略”不匹配的标签。
The XMLParser
class in the lxml library has a recover
constructor argument (see http://lxml.de/api/lxml.etree.XMLParser-class.html ). 该
XMLParser
的lxml的图书馆类有一个recover
构造函数的参数(见http://lxml.de/api/lxml.etree.XMLParser-class.html )。 When recover=True
, lxml will try to fix ill-formed input. 当
recover=True
,lxml将尝试修复格式错误的输入。 Example: 例:
from lxml import etree
BADINPUT = """
<root>
<foo>ABC</bar>
<baz>DEF</baz>
</root>"""
parser = etree.XMLParser(recover=True)
root = etree.fromstring(BADINPUT, parser)
print etree.tostring(root)
Output (the bad </bar>
end tag has been changed to </foo>
): 输出(错误的
</bar>
结束标记已更改为</foo>
):
<root>
<foo>ABC</foo>
<baz>DEF</baz>
</root>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.