简体   繁体   English

使用 Saxon C HE 和 Python 时没有 DTD 验证和 XInclude 解析

[英]No DTD validation and XInclude resolution when using Saxon C HE with Python

I have a question about the Saxon C HE version for Python. After the successful installation I tried some examples where I executed XSLT transformations.我对 Python 的 Saxon C HE 版本有疑问。成功安装后,我尝试了一些示例,其中我执行了 XSLT 转换。 These all worked.这些都奏效了。

However, when I parse an XML file, no DTD validation is performed during parsing and the XIncludes are not resolved.但是,当我解析一个 XML 文件时,在解析过程中没有执行 DTD 验证,也没有解析 XIncludes。 I have tried many things, however it is not possible for me to solve this problem.我已经尝试了很多东西,但是我不可能解决这个问题。 I hope someone can show me and explain my error.我希望有人能告诉我并解释我的错误。

Attached is an example which should show an error with intent when a DTD validation is done because there is no element with the name FOU in the DTD.附件是一个示例,当 DTD 验证完成时应该显示意图错误,因为 DTD 中没有名称为 FOU 的元素。 When I run the script then it creates a Result.xml file and both the erroneous FOU element is present and the XInclude which is not resolved.当我运行脚本时,它会创建一个 Result.xml 文件,并且存在错误的 FOU 元素和未解析的 XInclude。

I am aware that it is easy to do this with lxml, however I would like to know how it works with the Saxon parser.我知道使用 lxml 很容易做到这一点,但我想知道它如何与 Saxon 解析器一起工作。

XML Master: XML 师傅:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE TEST SYSTEM "Test.dtd">
<TEST>
    <FOU Id="A-1">
        <BAR Name="Test-Bar-1"/>
        <BAR Name="Test-Bar-2"/>
        <BAR Name="Test-Bar-3"/>
    </FOU>
    <TUTU Id="TU-1">
        <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="Include.xml" xpointer="xpointer(/node()/node()/*)"/>
    </TUTU>
</TEST>

XML Include: XML 包括:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE TEST SYSTEM "Test.dtd">
<TEST>
    <TUTU Id="TU-1">
        <TITI Name="Titi-1"/>
        <TITI Name="Titi-2"/>
        <TITI Name="Titi-3"/>
    </TUTU>
</TEST>

DTD: DTD:

<!ELEMENT TEST  (FOO+ , TUTU+)>
<!ELEMENT FOO   (BAR+)>
<!ELEMENT BAR   ANY>
<!ELEMENT TUTU  (TITI+)>
<!ELEMENT TITI  ANY>
<!-- Attribute -->
<!ATTLIST TEST
>
<!ATTLIST FOO
    Id      ID    #REQUIRED
>
<!ATTLIST BAR
    Name        CDATA #IMPLIED
>
<!ATTLIST TUTU
    Id      ID    #REQUIRED
>
<!ATTLIST TITI 
    Name        CDATA #IMPLIED
>

Python Script: Python 脚本:

import saxonc

with saxonc.PySaxonProcessor(license=False) as proc:
    print(proc.version)
    xdmAtomicval = proc.make_boolean_value(False)
    xsltproc = proc.new_xslt_processor()
    document = proc.parse_xml(xml_file_name='Master.xml')
    print(document)
    
    xsltproc.set_source(xdm_node=document)
    xsltproc.set_output_file("Result.xml")
    xsltproc.compile_stylesheet(stylesheet_file="styl.xslt")
    xsltproc.transform_to_file(stylesheet_file="styl.xslt")
    
    documentRes = proc.parse_xml(xml_file_name='Result.xml')
    print(documentRes)

You should be able to set the xi and dtd configuration properties to "on".您应该能够将xidtd 配置属性设置为“on”。

proc.set_configuration_property("xi", "on")
proc.set_configuration_property("dtd", "on")

However, the only way I could get it to work was if I removed the xpointer from the xinclude.但是,唯一能让它工作的方法是从 xinclude 中删除 xpointer。 I didn't have time to research why this isn't working.我没有时间研究为什么这不起作用。

It also doesn't appear that parse_xml() does any validation or xinclude resolution, but it did happen on the transform (set dtd validation to "off" or to "recover" to get Result.xml). parse_xml() 似乎也没有执行任何验证或 xinclude 解析,但它确实发生在转换上(将 dtd 验证设置为“关闭”或“恢复”以获取 Result.xml)。

Here's the modified version of your Python that I used to test...这是我用来测试的 Python 的修改版本...

import os
import saxonc

with saxonc.PySaxonProcessor(license=False) as proc:
    print(proc.version)
    proc.set_cwd(os.getcwd())
    proc.set_configuration_property("xi", "on")
    proc.set_configuration_property("dtd", "on")

    document = proc.parse_xml(xml_file_name='Master.xml')
    print(document)

    xsltproc = proc.new_xslt30_processor()
    xsltproc.transform_to_file(source_file="Master.xml", stylesheet_file="styl.xslt", output_file="Result.xml")

    documentRes = proc.parse_xml(xml_file_name='Result.xml')
    print(documentRes)

The PyDocumentBuilder class which is new in SaxonC 11 should be able to enable you to do DTD validation. PyDocumentBuilder 11 中新增的 PyDocumentBuilder class 应该能够让您进行 DTD 验证。 See: https://www.saxonica.com/saxon-c/doc11/html/saxonc.html#PyDocumentBuilder You should be able to use the method dtd_validation to set validation.请参阅: https://www.saxonica.com/saxon-c/doc11/html/saxonc.html#PyDocumentBuilder您应该能够使用方法 dtd_validation 来设置验证。

You can create a PyDocumentBuilder as follows:您可以按如下方式创建 PyDocumentBuilder:

proc.new_document_builder

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM