简体   繁体   English

如何使用Python中的Amara库针对XSD架构验证xml文件?

[英]How to validate an xml file against an XSD Schema using Amara library in Python?

High bounty for the following Q: 以下问题的赏金很高:

Hello, Here is what I tried on Ubuntu 9.10 using Python 2.6, Amara2 (by the way, test.xsd was created using xml2xsd tool): 您好,这是我在使用Python 2.6 Amara2的Ubuntu 9.10上尝试的方法(顺便说一下,test.xsd是使用xml2xsd工具创建的):

g@spot:~$ cat test.xml; echo =====o=====; cat test.xsd; echo ==== 
o=====; cat test.py; echo =====o=====; ./test.py; echo =====o===== 
<?xml version="1.0" encoding="utf-8"?>==; ./test.py` > 
test.txttest.xsd; echo === 
<test>abcde</test> 
=====o===== 
<?xml version="1.0" encoding="UTF-8"?> 
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" 
elementFormDefault="qualified"> 
  <xs:element name="test" type="xs:NCName"/> 
</xs:schema> 
=====o===== 
#!/usr/bin/python2.6 
# I wish to validate an xml file against an external XSD schema. 
from amara import bindery, parse 
source = 'test.xml' 
schema = 'test.xsd' 
#help(bindery.parse) 
#doc = bindery.parse(source, uri=schema, validate=True) # These 2 seem 
to fail in the same way. 
doc = parse(source, uri=schema, validate=True) # So, what is the 
difference anyway? 
# 
=====o===== 
Traceback (most recent call last): 
  File "./test.py", line 14, in <module> 
    doc = parse(source, uri=schema, validate=True) 
  File "/usr/local/lib/python2.6/dist-packages/Amara-2.0a4-py2.6-linux- 
x86_64.egg/amara/tree.py", line 50, in parse 
    return _parse(inputsource(obj, uri), flags, 
entity_factory=entity_factory) 
amara.ReaderError: In file:///home/g/test.xml, line 2, column 0: 
Missing document type declaration 
g@spot:~$ 
=====o===== 

So, why am I seeing this error? 那么,为什么我会看到此错误? Is this functionality not supported? 不支持此功能吗? How can I validate an XML file against an XSD while having the flexibility to point to any XSD file? 如何灵活地针对任何XSD文件针对XSD验证XML文件? Thanks, and let me know if you have questions. 谢谢,如果您有任何疑问,请告诉我。

If you're open to using another library besides amara, try lxml . 如果您愿意使用除amara之外的其他库,请尝试使用lxml It supports what you're trying to do pretty easily: 它支持您尝试轻松完成的任务:

from lxml import etree

source_file = 'test.xml'
schema_file = 'test.xsd'

with open(schema_file) as f_schema:

    schema_doc = etree.parse(f_schema)
    schema = etree.XMLSchema(schema_doc)
    parser = etree.XMLParser(schema = schema)

    with open(source_file) as f_source:
        try:
            doc = etree.parse(f_source, parser)
        except etree.XMLSyntaxError as e:
            # this exception is thrown on schema validation error
            print e

I'll recommend you to use noNamespaceSchemaLocation attribute to bind the XML file to the XSD schema. 我建议您使用noNamespaceSchemaLocation属性将XML文件绑定到XSD架构。 Then your XML file test.xml will be 然后,您的XML文件test.xml将是

<?xml version="1.0" encoding="utf-8"?>
<test xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:noNamespaceSchemaLocation="test.xsd">abcde</test>

where the file test.xsd 文件test.xsd

<?xml version="1.0" encoding="utf-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           elementFormDefault="qualified">
    <xs:element name="test" type="xs:NCName"/>
</xs:schema>

should be placed in the same directory as the test.xsd. 应该与test.xsd放在同一目录中。 It is general technique to reference the XML schema from the XML file and it should work in Python. 从XML文件引用XML模式是一种通用技术,它应该可以在Python中工作。

The advantage is that you don't need to know the schema file for every XML file . 好处是您不需要为每个XML文件都知道架构文件 It will be automatically found during parsing ( etree.parse ) of the XML file. 在解析XML文件( etree.parse )时会自动找到它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM