简体   繁体   English

编辑大型XML文件

[英]Edit a large XML file

I have a 3GB XML file. 我有一个3GB的XML文件。 I need to move nodes as child node of another. 我需要将节点移动为另一个的子节点。 Loading large file as XmlDocument is not efficient. 将大型文件加载为XmlDocument效率不高。 I see XmlReader is another approach but not sure exactly how it will work in my scenario and what other classes I should be using to do this. 我看到XmlReader是另一种方法,但不确定在我的方案中它将如何工作以及应该使用哪些其他类来执行此操作。

I need to move all alias node to its related customer>name node. 我需要将所有别名节点移至与其相关的customer> name节点。

<customer>
<name><first>Robert</first></name>
<alias>Rob</alias>
</customer>

I don't really understand exactly what transformation you want to perform, but assuming that @dbc's guess is correct, you could do it with a streaming XSLT 3.0 processor like this: 我不太清楚您要执行的转换,但是假设@dbc的猜测是正确的,则可以使用如下的XSLT 3.0流处理器来实现:

<xsl:transform version="3.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:mode streamable="yes" on-no-match="shallow-copy">

<xsl:template match="customer">
  <xsl:apply-templates select="copy-of(.)" mode="local"/>
</xsl:template>

<xsl:mode name="local" streamable="no" on-no-match="shallow-copy"/>

<xsl:template match="name" mode="local">
  <name>
    <xsl:apply-templates mode="local"/>
    <xsl:copy-of select="../alias"/>
  </name>
</xsl:template>

<xsl:template match="alias" mode="local"/>

</xsl:transform>

What's happening here is that everything gets copied in pure streaming mode (tag for tag) until we hit a customer element. 这里发生的事情是所有内容都以纯流模式复制(逐个标记),直到我们遇到一个客户元素。 When we encounter a customer element we make an in-memory copy of the element and transform it locally using a conventional non-streaming transformation. 当遇到客户元素时,我们会在该元素中进行内存复制,并使用常规的非流式转换在本地对其进行转换。 So the amount of memory needed is just enough to hold the largest customer element. 因此,所需的内存量足以容纳最大的客户元素。

What you can do is to take the basic logic of streaming an XmlReader to an XmlWriter from Mark Fussell's article Combining the XmlReader and XmlWriter classes for simple streaming transformations to transform your 3GB file into a modified file in which the <alias> nodes have been relocated to the <name> nodes. 你可以做的是采取流式传输的基本逻辑XmlReaderXmlWriter马克福塞尔的文章结合的XmlReader和XmlWriter的类简单的流转换改变你3GB的文件到其中的一个修改后的文件<alias>节点已搬迁到<name>节点。 An example of using such streaming transformations is given in this answer to Automating replacing tables from external files . 从自动从外部文件替换表的 答案中给出了使用此类流转换的示例。

Using that answer as a basis, grab the classes XmlReaderExtensions , XmlWriterExtensions , XmlStreamingEditorBase and XmlStreamingEditor from it and subclass XmlStreamingEditor to create CustomerAliasXmlEditor as follows: 使用该答案为基础,抢班XmlReaderExtensionsXmlWriterExtensionsXmlStreamingEditorBaseXmlStreamingEditor从它和子XmlStreamingEditor创建CustomerAliasXmlEditor如下:

class CustomerAliasXmlEditor : XmlStreamingEditor
{
    // Confirm that the <customer> element is not in any namespace.
    static readonly XNamespace customerNamespace = ""; 

    public static void TransformFromTo(string fromFilePath, XmlReaderSettings readerSettings, string toFilePath, XmlWriterSettings writerSettings)
    {
        using (var xmlReader = XmlReader.Create(fromFilePath, readerSettings))
        using (var xmlWriter = XmlWriter.Create(toFilePath, writerSettings))
        {
            new CustomerAliasXmlEditor(xmlReader, xmlWriter).Process();
        }
    }

    public CustomerAliasXmlEditor(XmlReader reader, XmlWriter writer)
        : base(reader, writer, ShouldTransform, Transform)
    {
    }

    static bool ShouldTransform(XmlReader reader)
    {
        return reader.GetElementName() == customerNamespace + "customer";
    }

    static void Transform(XmlReader from, XmlWriter to)
    {
        var customer = XElement.Load(from);
        var alias = customer.Element(customerNamespace + "alias");
        if (alias != null)
        {
            var name = customer.Element(customerNamespace + "name");
            if (name == null)
            {
                name = new XElement(customerNamespace + "name");
                customer.Add(name);
            }
            alias.Remove();
            name.Add(alias);
        }
        customer.WriteTo(to);
    }
}

Then if fromFileName is the name of your current 3GB XML file and toFileName is the name of the file to which to output the transformed XML, you can do: 然后,如果fromFileName是当前3GB XML文件的名称,而toFileName是要将转换后的XML输出到的文件的名称,则可以执行以下操作:

var readerSettings = new XmlReaderSettings { IgnoreWhitespace = true };
var writerSettings = new XmlWriterSettings { Indent = false}; // Or true if you prefer.

CustomerAliasXmlEditor.TransformFromTo(fromFileName, readerSettings, toFileName, writerSettings);

Sample working .Net fiddle showing that the XML 样本工作.Net提琴显示了XML

<Root>
<Item>
<SubItem>
<customer>
<name><first>Robert</first></name>
<alias>Rob</alias>
</customer>
</SubItem>
</Item>
<Item>
</Root>

Is transformed to 转化为

<Root>
  <Item>
    <SubItem>
      <customer>
        <name>
          <first>Robert</first>
          <alias>Rob</alias>
        </name>
      </customer>
    </SubItem>
  </Item>
  <Item>
</Root>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM