简体   繁体   English

如何根据节点数将XML文件拆分为多个XML文件

[英]How to Split an XML file into multiple XML Files based on number of nodes

This question is very similar to this one but with a small twist. 这个问题是非常相似, 这一个 ,但有一小搓。

I am trying to split a object representing xml to multiple xml objects based on number of tag elements allowed per object. 我试图根据每个对象允许的标记元素的数量将表示xml的对象拆分为多个xml对象。 I'm trying to get the best possible approach to this. 我正在努力寻找最好的方法。 Any help on this will be great... Sample example on what I am trying to do... 对此的任何帮助都会很棒......关于我想做什么的示例...

xml source representation: xml源表示:

 <?xml version="1.0" encoding="utf-8"?>
<DocType xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:pmlcore="urn:autoid:specification:interchange:xml:schema:1">
    <id>tbd</id>
    <Observation>
        <Command>c1</Command>
        <Tag>
            <id>....</id>
            <Data>...</Data>
        </Tag>
        <Tag>
            <id>....</id>
            <Data>...</Data>
        </Tag>
        <Tag>
            <id>....</id>
            <Data>...</Data>
        </Tag>
        <Tag>
            <id>....</id>
            <Data>...</Data>
        </Tag>
        <Data>...</Data>
    </Observation>
    <Observation>
        <Command>c2</Command>
        <Tag>
            <id>....</id>
            <Data>...</Data>
        </Tag>
        <Tag>
            <id>....</id>
            <Data>...</Data>
        </Tag>
        <Tag>
            <id>....</id>
            <Data>...</Data>
        </Tag>
        <Tag>
            <id>....</id>
            <Data>...</Data>
        </Tag>
        <Tag>
            <id>....</id>
            <Data>...</Data>
        </Tag>
        <Tag>
            <id>....</id>
            <Data>...</Data>
        </Tag>
        <Tag>
            <id>....</id>
            <Data>...</Data>
        </Tag>
        <Tag>
            <id>....</id>
            <Data>...</Data>
        </Tag>
        <Data>...</Data>
    </Observation>
</DocType>

Desired output given that number of allowed ' Tag ' elements per document is ... 3 给定每个文档允许的“ Tag ”元素数量的输出是... 3

xml 1: xml 1:

<?xml version="1.0" encoding="utf-8"?>
<DocType xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:pmlcore="urn:autoid:specification:interchange:xml:schema:1">
    <id>tbd</id>
    <Observation>
        <Command>c1</Command>
        <Tag>
            <id>....</id>
            <Data>...</Data>
        </Tag>
        <Tag>
            <id>....</id>
            <Data>...</Data>
        </Tag>
        <Tag>
            <id>....</id>
            <Data>...</Data>
        </Tag>
        <Data>...</Data>
    </Observation>
</DocType>

xml 2: xml 2:

<?xml version="1.0" encoding="utf-8"?>
<DocType xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:pmlcore="urn:autoid:specification:interchange:xml:schema:1">
    <id>tbd</id>
    <Observation>
        <Command>c1</Command>
        <Tag>
            <id>....</id>
            <Data>...</Data>
        </Tag>
        <Data>...</Data>
    </Observation>
    <Observation>
        <Command>c2</Command>
        <Tag>
            <id>....</id>
            <Data>...</Data>
        </Tag>
        <Tag>
            <id>....</id>
            <Data>...</Data>
        </Tag>
        <Data>...</Data>
    </Observation>
</DocType>

I believe by now you got idea what's the requirement but I'll continue: 我相信到现在你知道要求是什么,但我会继续:

xml 3: xml 3:

<?xml version="1.0" encoding="utf-8"?>
<DocType xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:pmlcore="urn:autoid:specification:interchange:xml:schema:1">
    <id>tbd</id>
    <Observation>
        <Command>c2</Command>
        <Tag>
            <id>....</id>
            <Data>...</Data>
        </Tag>
        <Tag>
            <id>....</id>
            <Data>...</Data>
        </Tag>
        <Tag>
            <id>....</id>
            <Data>...</Data>
        </Tag>
        <Data>...</Data>
    </Observation>
</DocType>

xml 4: xml 4:

<?xml version="1.0" encoding="utf-8"?>
<DocType xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:pmlcore="urn:autoid:specification:interchange:xml:schema:1">
    <id>tbd</id>
    <Observation>
        <Command>c2</Command>
        <Tag>
            <id>....</id>
            <Data>...</Data>
        </Tag>
        <Tag>
            <id>....</id>
            <Data>...</Data>
        </Tag>
        <Tag>
            <id>....</id>
            <Data>...</Data>
        </Tag>
        <Data>...</Data>
    </Observation>
</DocType>

You need to load the initial document, after that remove the Observation tags from the document. 您需要加载初始文档,然后从文档中删除Observation标记。 Loop Observation tags and create new document in which you add the Observation tag item. Loop Observation标记并创建新文档,您可以在其中添加Observation标记项。 In docList you have all new documents. 在docList中,您拥有所有新文档。

        var result = doc.Root.Elements().Where(x => x.Name == "Observation").ToList();

        doc.Root.Elements().Where(x => x.Name == "Observation").Remove();

        List<XDocument> docList = new List<XDocument>();
        foreach(var el in result)
        {
            XDocument d = new XDocument(doc);

            d.Root.Add(el);

            docList.Add(d);
        }

I think that you best option is setting up a model for the data you have. 我认为您最好的选择是为您拥有的数据建立模型。

public class Observation
{
    public string Command { get; set; }

    public List<Tag> Tags { get; set; }
}

[...] // Define also de Tag class

Then you can easily read the xml with LINQ to XML, process the models with the criteria you want and save it back using LINQ to XML. 然后,您可以使用LINQ to XML轻松读取xml,使用所需的条件处理模型,并使用LINQ to XML将其保存。

I really feel that it's out of the scope of the question to learn how to use LINQ to XML, so I'm referring you to another question that deals with it: Parse xml using LINQ to XML to class objects 我真的觉得学习如何使用LINQ to XML超出了问题的范围,所以我指的是另一个处理它的问题: 使用LINQ to XML将xml解析为类对象

And please, try not to use directly the data as raw rows and then saving it again, any change you want to make after that will be a nightmare. 并且请尽量不要将数据直接用作原始行然后再次保存,之后要进行的任何更改都将成为一场噩梦。

XSLT 2.0 (as supported by Saxon https://www.nuget.org/packages/Saxon-HE/ ) allows you to transform an XML document into multiple, here is one approach to split your input into several files: XSLT 2.0(由Saxon https://www.nuget.org/packages/Saxon-HE/支持)允许您将XML文档转换为多个文档,这是将输入拆分为多个文件的一种方法:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="xs"
    version="2.0">

    <xsl:param name="tags-per-doc" as="xs:integer" select="3"/>

    <xsl:strip-space elements="*"/>
    <xsl:output indent="yes"/>

    <xsl:template match="/">
        <xsl:for-each-group select="//Tag" group-adjacent="(position() - 1) idiv $tags-per-doc">
            <xsl:result-document href="result{position()}.xml">
                <xsl:apply-templates select="/*"/>
            </xsl:result-document>
        </xsl:for-each-group>
    </xsl:template>

    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="Observation">
        <xsl:if test="current-group() intersect *">
            <xsl:copy>
                <xsl:apply-templates select="@*, node()[. intersect current-group() or not(self::Tag)]"/>
            </xsl:copy>
        </xsl:if>
    </xsl:template>

</xsl:stylesheet>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM