简体   繁体   English

替换部分大型XML文件

[英]Replace part of large XML file

I have large XML file, and I need to replace elements with some name (and all inner elements) with another element. 我有一个很大的XML文件,我需要用另一个元素替换某些名称(以及所有内部元素)的元素。 For example - if this element e : 例如-如果此元素e

<a>
<b></b>
<e>
   <b></b>
   <c></c>
</e>
</a>

After replace e for elem : 在将e替换为elem

<a>
<b></b>
<elem></elem>
</a>

update: I try use XDocument but xml size more then 2gb and I have SystemOutOfMemoryException 更新:我尝试使用XDocument但xml大小大于2gb,并且我有SystemOutOfMemoryException

update2: my code, but xml not transform update2:我的代码,但是xml无法转换

XmlReader reader = XmlReader.Create("xml_file.xml");
XmlWriter wr = XmlWriter.Create(Console.Out);
while (reader.Read())
   {
       if (reader.NodeType == XmlNodeType.Element && reader.Name == "e")
       {
           wr.WriteElementString("elem", "val1");
           reader.ReadSubtree();
       }
            wr.WriteNode(reader, false);
   }
wr.Close();

update 3: 更新3:

<a>
<b></b>
<e>
   <b></b>
   <c></c>
</e>
<i>
  <e>
    <b></b>
    <c></c>
  </e>
</i> 
</a>

Taking inspiration from this blog post , you can basically just stream the contents of the XmlReader straight to the XmlWriter similarly to your example code, but handling all node types. 本博文中汲取灵感,您基本上可以像示例代码一样直接将XmlReader的内容直接流到XmlWriter ,但是可以处理所有节点类型。 Using WriteNode , as in your example code, will add the node and all child nodes , so you wouldn't be able to handle each descendant in your source XML. 如示例代码所示,使用WriteNode将添加该节点和所有子节点 ,因此您将无法处理源XML中的每个后代。

In addition, you need to make sure you read to the end of the element you want to skip - ReadSubtree creates an XmlReader for this, but it doesn't actually do any reading. 另外,您需要确保已读取到要跳过的元素的末尾ReadSubtree创建了一个XmlReader ,但实际上并未进行任何读取。 You need to ensure this is read to the end. 您需要确保将其读到最后。

The resulting code might look like this: 结果代码如下所示:

using (var reader = XmlReader.Create(new StringReader(xml), rs))
using (var writer = XmlWriter.Create(Console.Out, ws))
{
    while (reader.Read())
    {
        switch (reader.NodeType)
        {
            case XmlNodeType.Element:
                var subTreeReader = reader.ReadSubtree();
                if (HandleElement(reader, writer))
                {
                    ReadToEnd(subTreeReader);
                }
                else
                {
                    writer.WriteStartElement(reader.Prefix, reader.LocalName, reader.NamespaceURI);
                    writer.WriteAttributes(reader, true);
                    if (reader.IsEmptyElement)
                    {
                        writer.WriteEndElement();
                    }
                }
                break;
            case XmlNodeType.Text:
                writer.WriteString(reader.Value);
                break;
            case XmlNodeType.Whitespace:
            case XmlNodeType.SignificantWhitespace:
                writer.WriteWhitespace(reader.Value);
                break;
            case XmlNodeType.CDATA:
                writer.WriteCData(reader.Value);
                break;
            case XmlNodeType.EntityReference:
                writer.WriteEntityRef(reader.Name);
                break;
            case XmlNodeType.XmlDeclaration:
            case XmlNodeType.ProcessingInstruction:
                writer.WriteProcessingInstruction(reader.Name, reader.Value);
                break;
            case XmlNodeType.DocumentType:
                writer.WriteDocType(reader.Name, reader.GetAttribute("PUBLIC"), reader.GetAttribute("SYSTEM"), reader.Value);
                break;
            case XmlNodeType.Comment:
                writer.WriteComment(reader.Value);
                break;
            case XmlNodeType.EndElement:
                writer.WriteFullEndElement();
                break;
        }
    }    
}

private static void ReadToEnd(XmlReader reader)
{
    while (!reader.EOF)
    {
        reader.Read();
    }
}

Obviously put whatever your logic is inside HandleElement , returning true if the element is handled (and therefore to be ignored). 显然,将您的逻辑放在HandleElement ,如果处理了元素,则返回true (因此将被忽略)。 The implementation for the logic in your example code would be: 您的示例代码中的逻辑实现为:

private static bool HandleElement(XmlReader reader, XmlWriter writer)
{
    if (reader.Name == "e")
    {
        writer.WriteElementString("element", "val1");
        return true;
    }

    return false;
}

Here is a working demo: https://dotnetfiddle.net/FFIBU4 这是一个有效的演示: https : //dotnetfiddle.net/FFIBU4

try this (saw the C# tag :D) : 试试看(看到C#标签:D):

        XElement elem = new XElement("elem");
        IEnumerable<XElement> listElementsToBeReplaced = xDocument.Descendants("e");
        foreach (XElement replaceElement in listElementsToBeReplaced)
        {
            replaceElement.AddAfterSelf(elem);
        }
        listElementsToBeReplaced.Remove();

I would replace it with a regular expression, matching e elements with all its content and ending with the closing tag, and replacing it with the new elem element. 我将用正则表达式替换它,将e元素与其所有内容匹配,并以结束标记结尾,然后将其替换为新的elem元素。 This way you can do it in any editor with search/replace that supports regular expressions and programatically in any language. 这样,您可以在任何支持搜索/替换的编辑器中执行此操作,该搜索/替换支持正则表达式并且以任何语言以编程方式进行。

string xml = @"<a>
<b></b>
<e>
<b></b>
<c></c>
</e>
</a>";
string patten = @"<e[^>]*>[\s\S]*?(((?'Open'<e[^>]*>)[\s\S]*?)+((?'-Open'</e>)[\s\S]*?)+)*(?(Open)(?!))</e>";
Console.WriteLine(Regex.Replace(xml,patten,"<ele></ele>"));

Use regex,also can use LinqToXml 使用正则表达式,也可以使用LinqToXml

// example data:
XDocument xmldoc = XDocument.Parse(
@"
<a>
<b></b>
<e>
   <b></b>
   <c></c>
</e>
<c />
<e>
   <b></b>
   <c></c>
   <c></c>
</e>
</a>
");
            // you can use xpath, then you need to add:
            // using System.Xml.XPath;
            List<XElement> elementsToReplace = xmldoc.XPathSelectElements("a/e").ToList();

            // or pure linq-to-sql:
            // elementsToReplace = xmldoc.Elements("a").Elements("e").ToList();

            foreach (XElement elem in elementsToReplace)
            {
                // setting Value of XElement to an empty string causes the resulting xml to look like this:
                // <elem></elem>
                // and not like this:
                // <elem />
                elem.ReplaceWith(new XElement("elem", ""));
                // if you don't mind self closing tags, then:
                // elem.ReplaceWith(new XElement("elem"));
            }

I didn't measure the performance but rumour has it the difference is not very significant. 我没有衡量性能,但有传言说它的差别不是很明显。

XPath syntax, if you need it: http://www.w3schools.com/xpath/xpath_syntax.asp XPath语法(如果需要): http : //www.w3schools.com/xpath/xpath_syntax.asp

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM