[英]How to validate xml using python without third-party libs?
I have some xml pieces like this: 我有一些像这样的xml片段:
<!DOCTYPE mensaje SYSTEM "record.dtd">
<record>
<player_birthday>1979-09-23</player_birthday>
<player_name>Orene Ai'i</player_name>
<player_team>Blues</player_team>
<player_id>453</player_id>
<player_height>170</player_height>
<player_position>F&W</player_position> <---- a '&' here.
<player_weight>75</player_weight>
</record>
Is there any way to validate whether the xml pieces is well-formatted? 有没有办法验证xml片段是否格式良好? Is there any way to validate the xml against a DTD or XML Scheme? 有没有办法根据DTD或XML方案验证xml?
For various reasons I can't use any third-party packages. 由于各种原因, 我无法使用任何第三方软件包。
eg the xml above is not conrrect since it has a '&' in it. 例如,上面的xml不是正确的,因为它中有一个'&'。 Note that the DOCTYPE definition sentence refer to a DTD. 请注意,DOCTYPE定义句子指的是DTD。
Just try to parse it with ElementTree (xml.etree.ElementTree.fromstring) - it will raise an error if the XML is not well formed. 只是尝试使用ElementTree(xml.etree.ElementTree.fromstring)解析它 - 如果XML格式不正确,它将引发错误。
>>> a = """<record>
... <player_birthday>1979-09-23</player_birthday>
... <player_name>Orene Ai'i</player_name>
... <player_team>Blues</player_team>
... <player_id>453</player_id>
... <player_height>170</player_height>
... <player_position>F&W</player_position> <---- a '&' here.
... <player_weight>75</player_weight>
... </record>"""
>>>
>>> from xml.etree import ElementTree as ET
>>> x = ET.fromstring(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1282, in XML
parser.feed(text)
File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1624, in feed
self._raiseerror(v)
File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1488, in _raiseerror
raise err
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 7, column 24
You can use python's xml.dom.minidom
XML parser (which is in the standard library, but isn't as powerful as alternatives such as lxml
). 您可以使用python的xml.dom.minidom
XML解析器(它位于标准库中,但不像lxml
那样强大)。
Just do: 做就是了:
import xml.dom.minidom
xml.dom.minidom.parseString('<My><XML><String/><XML/><My/>')
You will get a xml.parsers.expat.ExpatError
if the XML is invalid. 如果XML无效,您将获得xml.parsers.expat.ExpatError
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.