简体   繁体   English

如何在没有第三方库的情况下使用python验证xml?

[英]How to validate xml using python without third-party libs?

I have some xml pieces like this: 我有一些像这样的xml片段:

<!DOCTYPE mensaje SYSTEM "record.dtd">
<record>
    <player_birthday>1979-09-23</player_birthday>
    <player_name>Orene Ai'i</player_name>
    <player_team>Blues</player_team>
    <player_id>453</player_id>
    <player_height>170</player_height>
    <player_position>F&W</player_position>   <---- a '&' here.
    <player_weight>75</player_weight>
</record>

Is there any way to validate whether the xml pieces is well-formatted? 有没有办法验证xml片段是否格式良好? Is there any way to validate the xml against a DTD or XML Scheme? 有没有办法根据DTD或XML方案验证xml?

For various reasons I can't use any third-party packages. 由于各种原因, 我无法使用任何第三方软件包。

eg the xml above is not conrrect since it has a '&' in it. 例如,上面的xml不是正确的,因为它中有一个'&'。 Note that the DOCTYPE definition sentence refer to a DTD. 请注意,DOCTYPE定义句子指的是DTD。

Just try to parse it with ElementTree (xml.etree.ElementTree.fromstring) - it will raise an error if the XML is not well formed. 只是尝试使用ElementTree(xml.etree.ElementTree.fromstring)解析它 - 如果XML格式不正确,它将引发错误。

>>> a = """<record>
...     <player_birthday>1979-09-23</player_birthday>
...     <player_name>Orene Ai'i</player_name>
...     <player_team>Blues</player_team>
...     <player_id>453</player_id>
...     <player_height>170</player_height>
...     <player_position>F&W</player_position>   <---- a '&' here.
...     <player_weight>75</player_weight>
... </record>"""
>>> 
>>> from xml.etree import ElementTree as ET
>>> x = ET.fromstring(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1282, in XML
    parser.feed(text)
  File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1624, in feed
    self._raiseerror(v)
  File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1488, in _raiseerror
    raise err
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 7, column 24

You can use python's xml.dom.minidom XML parser (which is in the standard library, but isn't as powerful as alternatives such as lxml ). 您可以使用python的xml.dom.minidom XML解析器(它位于标准库中,但不像lxml那样强大)。

Just do: 做就是了:

import xml.dom.minidom
xml.dom.minidom.parseString('<My><XML><String/><XML/><My/>')

You will get a xml.parsers.expat.ExpatError if the XML is invalid. 如果XML无效,您将获得xml.parsers.expat.ExpatError

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python无需第三方库即可操作和保存XML - Python Manipulate and save XML without third-party libraries 是否可以在python中不使用第三方库的情况下抓取网页? - Is it possible to scrape webpage without using third-party libraries in python? Python - 如何在不使用第三方库的情况下读取图像像素? - Python - How to read image pixels without using third-party libraries? 自定义外部/第三方库的日志记录 - Customize logging for external/third-party libs Python:如何使用第三方库 - Python: How to use a third-party library linux上哪种语言(python / perl / tcl)不需要安装第三方库? - which language (python/perl/tcl) on linux doesn't need to install the third-party libs? 如何在不编辑原始模块的情况下从第三方Python模块向类添加方法 - How do I add a method to a class from a third-party Python module without editing the original module 如何在没有第三方扩展的情况下在 python IDLE 3.9.4 中启用自动完成功能? - how to enable autocomplete in python IDLE 3.9.4 without third-party extensions? 如何使用 Celery(无第三方包)发送 Django 的密码重置邮件? - How to send Django's password reset email using Celery (without third-party package)? 使用阻止Flask应用程序的第三方Python模块 - Using third-party Python module that blocks Flask application
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM