简体   繁体   English

对lxml使用重写

[英]Using rewrite with lxml

I am generating an XML Schema and then generating data files in Python3. 我正在生成XML模式,然后在Python3中生成数据文件。

The generated schema includes a base schema and I use a catalog to change the include URI to a local file. 生成的模式包括基本模式,我使用目录将include URI更改为本地文件。 I set the environment variable 'XML_CATALOG_FILES' in Python and this works great. 我在Python中设置了环境变量'XML_CATALOG_FILES',这很好用。

However, I try to use rewriteSystem in order to use the locally generated schema in place of the generic location reference in the data files and rewrite doesn't seem to work. 但是,我尝试使用rewriteSystem来使用本地生成的模式代替数据文件中的通用位置引用,并且重写似乎不起作用。

Here is the catalog. 这是目录。

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE catalog PUBLIC "-//OASIS//DTD XML Catalogs V1.1//EN" "http://www.oasis-open.org/committees/entity/release/1.1/catalog.dtd">
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">

  <!-- S3Model 3.0.0 RM Schema -->
  <uri name="https://www.s3model.com/ns/s3m/s3model_3_0_0.xsd" uri="s3model/s3model_3_0_0.xsd"/>


  <!-- S3Model DMs -->
  <rewriteSystem systemIdStartString="https://dmgen.s3model.com/dmlib/" rewritePrefix="file:///home/tim/DII/Kunteksto/output/"/>
</catalog>

This catalog file does work fine when used in Oxygen using either Xerces or Saxon to validate. 在氧气中使用Xerces或Saxon进行验证时,此目录文件可以正常工作。

An example reference in the XML file looks like this: XML文件中的示例引用如下所示:

xsi:schemaLocation="https://www.s3model.com/ns/s3m/ https://dmgen.s3model.com/dmlib/dm-a42592f1-e8b3-4862-b6e2-ac0e48c138f4.xsd">

Any ideas why lxml (Libxml2) does recognize this rewriteSystem? 有什么想法为什么lxml(Libxml2)可以识别此rewriteSystem?

Instead of creating a parser and referencing the schema in the data file. 而不是创建解析器并在数据文件中引用架构。

I used a different approach by creating a schema object from the schema string in lxml. 我通过从lxml中的模式字符串创建模式对象使用了不同的方法。

    schema_doc = etree.parse(schema)
    modelSchema = etree.XMLSchema(schema_doc)

the variable schema holds the string representation of the XML schema. 变量模式保存XML模式的字符串表示形式。

Then as each data document is created it is validated with that schema using: 然后,在创建每个数据文档时,使用以下模式对该模式进行验证:

  try:
     tree = etree.parse(StringIO(xmlStr))
     modelSchema.assertValid(tree)
  except etree.DocumentInvalid:
     file_id = "Invalid_" + file_id

I had to remove the XML declaration: 我必须删除XML声明:

<?xml version="1.0" encoding="UTF-8"?>

to get etree.parse too work correctly. 使etree.parse也能正常工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM