简体   繁体   English

替换xml流中文本的最有效方法

[英]Most efficient way to replace text in xml stream

I have a huge chunk of XML data that I need to "clean". 我需要“清理”大量XML数据。 The Xml looks something like this: Xml看起来像这样:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
    <w:body>
        <w:p>       
                    <w:t>F_ck</w:t>
            <!-- -->
                <w:t>F_ck</w:t>
            <!-- -->
                            <w:t>F_ck</w:t>
        </w:p>
    </w:body>
</w:document>

I would like to identify the <w:t> -elements with the value "F_ck" and replace the value with something else. 我想用值“ F_ck”标识<w:t> -elements并将其替换为其他值。 The elements I need to clean will be scattered throughout the document. 我需要清理的元素将散布在整个文档中。

I need the code to run as fast as possible and with a memory footprint as small as possible, so I am reluctant to use the XDocument (DOM) approaches I have found here and elsewhere. 我需要代码以尽可能快的速度运行,并占用尽可能小的内存,因此我不愿意使用在这里和其他地方找到的XDocument (DOM)方法。

The data is given to me as a stream containing the Xml data, and my gut feeling tells me that I need the XmlTextReader and the XmlTextWriter . 数据以包含Xml数据的流的形式提供给我,我的直觉告诉我我需要XmlTextReaderXmlTextWriter

My original idea was to do a SAX-mode, forward-only run through the Xml data and "pipe" it over to the XmlTextWriter , but I cannot find an intelligent way to do so. 我最初的想法是做一个SAX模式,只向前运行Xml数据并将其“管道”到XmlTextWriter ,但是我找不到一种明智的方法。

I wrote this code: 我写了这段代码:

var reader = new StringReader(content);
var xmltextReader = new XmlTextReader(reader);
var memStream = new MemoryStream();
var xmlWriter = new XmlTextWriter(memStream, Encoding.UTF8);

while (xmltextReader.Read())
{
    if (xmltextReader.Name == "w:t")
    {
        //xmlWriter.WriteRaw("blah");
    }
    else
    {
        xmlWriter.WriteRaw(xmltextReader.Value);
    }
}

The code above only takes the value of elements declaration etc, so no brackets or anything. 上面的代码仅采用元素声明等的值,因此没有方括号或其他任何内容。 I realize that I could write code that specifically executed .WriteElement() , .WriteEndElement() etc depending on the NodeType , but I fear that will quickly be a mess. 我意识到我可以根据NodeType编写专门执行.WriteElement() .WriteEndElement()等的代码,但是我担心这会很快变得一团糟。

So the question is: 所以问题是:

How do I - in a nice way - pipe the xml data read from the XmlTextReader to the XmlTextWriter while still being able to manipulate the data while piping? 我如何(以一种很好的方式)将从XmlTextReader读取的xml数据通过管道传输到XmlTextWriter同时仍然能够在管道传输时操纵数据?

Try this 尝试这个

 using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Xml; using System.Xml.Linq; namespace ConsoleApplication1 { class Program { static void Main(string[] args) { string xml = "<?xml version=\\"1.0\\" encoding=\\"utf-8\\" standalone=\\"yes\\"?>" + "<w:document xmlns:w=\\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\\">" + "<w:body>" + "<w:p>" + "<w:t>F_ck</w:t>" + "<!-- -->" + "<w:t>F_ck</w:t>" + "<!-- -->" + "<w:t>F_ck</w:t>" + "</w:p>" + "</w:body>" + "</w:document>"; XDocument doc = XDocument.Parse(xml); XElement document = (XElement)doc.FirstNode; XNamespace ns_w = document.GetNamespaceOfPrefix("w"); List<XElement> ts = doc.Descendants(ns_w + "t").ToList(); foreach (XElement t in ts) { t.Value = "abc"; } } } }​ 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM