[英]Most efficient way to replace text in xml stream
I have a huge chunk of XML data that I need to "clean". 我需要“清理”大量XML数据。 The Xml looks something like this:
Xml看起来像这样:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
<w:body>
<w:p>
<w:t>F_ck</w:t>
<!-- -->
<w:t>F_ck</w:t>
<!-- -->
<w:t>F_ck</w:t>
</w:p>
</w:body>
</w:document>
I would like to identify the <w:t>
-elements with the value "F_ck" and replace the value with something else. 我想用值“ F_ck”标识
<w:t>
-elements并将其替换为其他值。 The elements I need to clean will be scattered throughout the document. 我需要清理的元素将散布在整个文档中。
I need the code to run as fast as possible and with a memory footprint as small as possible, so I am reluctant to use the XDocument
(DOM) approaches I have found here and elsewhere. 我需要代码以尽可能快的速度运行,并占用尽可能小的内存,因此我不愿意使用在这里和其他地方找到的
XDocument
(DOM)方法。
The data is given to me as a stream containing the Xml data, and my gut feeling tells me that I need the XmlTextReader
and the XmlTextWriter
. 数据以包含Xml数据的流的形式提供给我,我的直觉告诉我我需要
XmlTextReader
和XmlTextWriter
。
My original idea was to do a SAX-mode, forward-only run through the Xml data and "pipe" it over to the XmlTextWriter
, but I cannot find an intelligent way to do so. 我最初的想法是做一个SAX模式,只向前运行Xml数据并将其“管道”到
XmlTextWriter
,但是我找不到一种明智的方法。
I wrote this code: 我写了这段代码:
var reader = new StringReader(content);
var xmltextReader = new XmlTextReader(reader);
var memStream = new MemoryStream();
var xmlWriter = new XmlTextWriter(memStream, Encoding.UTF8);
while (xmltextReader.Read())
{
if (xmltextReader.Name == "w:t")
{
//xmlWriter.WriteRaw("blah");
}
else
{
xmlWriter.WriteRaw(xmltextReader.Value);
}
}
The code above only takes the value of elements declaration etc, so no brackets or anything. 上面的代码仅采用元素声明等的值,因此没有方括号或其他任何内容。 I realize that I could write code that specifically executed
.WriteElement()
, .WriteEndElement()
etc depending on the NodeType
, but I fear that will quickly be a mess. 我意识到我可以根据
NodeType
编写专门执行.WriteElement()
.WriteEndElement()
等的代码,但是我担心这会很快变得一团糟。
So the question is: 所以问题是:
How do I - in a nice way - pipe the xml data read from the XmlTextReader
to the XmlTextWriter
while still being able to manipulate the data while piping? 我如何(以一种很好的方式)将从
XmlTextReader
读取的xml数据通过管道传输到XmlTextWriter
同时仍然能够在管道传输时操纵数据?
Try this 尝试这个
using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Xml; using System.Xml.Linq; namespace ConsoleApplication1 { class Program { static void Main(string[] args) { string xml = "<?xml version=\\"1.0\\" encoding=\\"utf-8\\" standalone=\\"yes\\"?>" + "<w:document xmlns:w=\\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\\">" + "<w:body>" + "<w:p>" + "<w:t>F_ck</w:t>" + "<!-- -->" + "<w:t>F_ck</w:t>" + "<!-- -->" + "<w:t>F_ck</w:t>" + "</w:p>" + "</w:body>" + "</w:document>"; XDocument doc = XDocument.Parse(xml); XElement document = (XElement)doc.FirstNode; XNamespace ns_w = document.GetNamespaceOfPrefix("w"); List<XElement> ts = doc.Descendants(ns_w + "t").ToList(); foreach (XElement t in ts) { t.Value = "abc"; } } } }
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.