简体   繁体   English

使用lxml保留子元素名称空间序列化

[英]Preserving subelement namespace serialization with lxml

I have a few different XML documents that I'm trying to combine into one using lxml. 我有一些不同的XML文档,我正在尝试使用lxml合并为一个。 The problem is that I need the result to preserve the namespaces on each of the sub-documents' root nodes. 问题是我需要结果来保留每个子文档的根节点上的名称空间。 Lxml seems to want to push any namespace declarations used more than once to the root of the new document, which breaks in my application (it is an acknowledged bug). Lxml似乎想将多次使用的所有名称空间声明推送到新文档的根目录,这会在我的应用程序中中断(这是一个公认的错误)。

So for example, I have document A: 例如,我有文档A:

<dc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/">
   <title>La difesa della razza: scienza, documentazione, polemica. anno 1:n. 1</title>
</dc>

and document B: 和文件B:

<mods xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-3.xsd">
<titleInfo>
    <nonSort>La</nonSort>
        <title>difesa della razza</title>
        <subTitle>scienza, documentazione, polemica</subTitle>
        <partNumber>anno 1:n. 1</partNumber>
</titleInfo>
</mods>

I want to wrap them in a element that also uses an xsi:schemaLocation, but I need the namespace declaration (xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance") to appear in all three nodes, like this: 我想将它们包装在也使用xsi:schemaLocation的元素中,但是我需要名称空间声明(xmlns:xsi =“ http://www.w3.org/2001/XMLSchema-instance”)出现在所有这三个元素中节点,如下所示:

<wrap xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.example.org" xmlns:dc="http://www.foo.org" xmlns:mods="http://www.bar.org">

    <dc:dc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/">
       <dc:title>La difesa della razza: scienza, documentazione, polemica. anno 1:n. 1</dc:title>
    </dc:dc>

    <mods:mods xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-3.xsd">
    <mods:titleInfo>
        <mods:nonSort>La</mods:nonSort>
            <mods:title>difesa della razza</mods:title>
            <mods:subTitle>scienza, documentazione, polemica</mods:subTitle>
            <mods:partNumber>anno 1:n. 1</mods:partNumber>
    </mods:titleInfo>
    </mods:mods>
</wrap>

However, when I append these two documents using Python/lxml 但是,当我使用Python / lxml附加这两个文档时

wrap.append(dc)
wrap.append(mods)

I get the declaration pushed up to the highest level node that uses it. 我将声明推送到使用它的最高级别的节点。 Unfortunately, this is a problem for my application. 不幸的是,这对我的应用程序是一个问题。 Like this: 像这样:

<wrap xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.example.org" xmlns:dc="http://www.foo.org" xmlns:mods="http://www.bar.org">

    <dc:dc xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/">
       <dc:title>La difesa della razza: scienza, documentazione, polemica. anno 1:n. 1</dc:title>
    </dc:dc>

    <mods:mods xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-3.xsd">
    <mods:titleInfo>
        <mods:nonSort>La</mods:nonSort>
            <mods:title>difesa della razza</mods:title>
            <mods:subTitle>scienza, documentazione, polemica</mods:subTitle>
            <mods:partNumber>anno 1:n. 1</mods:partNumber>
    </mods:titleInfo>
    </mods:mods>
</wrap>

Any ideas how I can force the behavior I want? 有什么想法可以强迫我进行自己想要的行为吗?

THanks 谢谢

You could try inserting XInclude elements first, and then resolving them with the .xinclude() method (see docs ). 您可以尝试先插入XInclude元素,然后使用.xinclude()方法解析它们(请参阅docs )。 That seems to preserve the namespace declarations (lxml keeps them when they originate from the parser, but not when you create elements yourself, or move elements from one document to another) 这似乎保留了名称空间声明(当lxml来自解析器时,它们会保留它们,但是当您自己创建元素或将元素从一个文档移动到另一个文档时,lxml会保留它们)

Note that in your case, you would still need to change the tag name of the elements: they will be included as they are in the original documents, without any namespace, while you seem to have changed them to namespaced element names in your output. 请注意,在您的情况下,您仍然需要更改元素的标记名:它们将按原样包含在原始文档中,没有任何名称空间,而您似乎已在输出中将它们更改为命名空间的元素名称。

You might have to use a custom resolver , contrary to what the docs might seem to say about .xinclude() not supporting this (it does use resolvers from the parser used to parse the containing document, it just doesn't support passing a specific resolver or parser to the XInclude processing). 您可能必须使用自定义解析器 ,这与文档中关于.xinclude()不支持该解析器的说法似乎相反(它确实使用了用于解析包含文档的解析器中的解析器,它只是不支持传递特定的XInclude处理程序的解析器或解析器)。

The other option would probably be an xslt-based solution. 另一个选择可能是基于xslt的解决方案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM