简体   繁体   English

如何用特殊字符替换XML文档中的文本?

[英]How to replace text in XML document with special chars?

Look at the end of this post for an addition to that problem with textboxes! 在这篇文章的末尾查看有关文本框的其他问题!

With this method I want to open a document, replace some text and then leave it alone. 使用这种方法,我想打开一个文档,替换一些文本,然后不理会它。 It works, thats something to be proud of. 它有效,那是值得骄傲的。 :D :d

public static void replaceInOpenXMLDocument(string pfad, string zuErsetzen, string neuerString)
        {
            using (WordprocessingDocument doc = WordprocessingDocument.Open(pfad, true))
            {
                var res = from bm in doc.MainDocumentPart.Document.Body.Descendants()
                          where bm.InnerText != string.Empty && bm.InnerText.Contains(zuErsetzen) && bm.HasChildren == false
                          select bm;

                foreach (var item in res)
                {
                    item.InsertAfterSelf(new Text(item.InnerText.Replace(zuErsetzen, neuerString)));
                    item.Remove();
                }
                doc.Close();
            }
        }

But it only works on replacing without special characters . 但是它仅适用于不带特殊字符的替换。 For example: 例如:

OS will be replaced with Windows over 9000 操作系统将被Windows over 9000取代

[OS] will be left as it is. [OS]将保持不变。

CASE 1: 情况1:

In the document: 在文件中:

You use os for whatever purpose you've got. 您将os用于任何目的。

replaceInOpenXMLDocument("C:\NSA\suspects.docx", "os", "Win 2000");

Will result in this: 将导致以下结果:

You use Win 2000 for whatever purpose you've got. 您将Win 2000用于任何目的。

CASE 2: 情况2:

With special chars ... 与特殊字符...

You use [os] for whatever purpose you've got. 您可以将[os]用于任何目的。

replaceInOpenXMLDocument("C:\NSA\suspects.docx", "[os]", "Win 2000");

... it just ignores me: ...它只是不理我:

You use [os] for whatever purpose you've got. 您可以将[os]用于任何目的。

I tried several special characters ()[]{} etc., but they're never replaced. 我尝试了几个特殊字符()[] {}等,但从未替换过。

Is there something I forgot to do? 有什么我忘了做的事吗? Or is it simply not able to replace with special characters with this method? 还是使用这种方法无法用特殊字符替换? If so, I just need a simple workaround. 如果是这样,我只需要一个简单的解决方法。

Is there anybody out to help with my desperation? 有没有人可以帮助我绝望? :) :)

SOLUTION / ADDITION 1: 解决方案/添加方式1:

Thanks to Flowerking for that! 感谢Flowerking This is the code I'm using right now: 这是我现在正在使用的代码:

public static void replaceInOpenXMLDocument(string pfad, string zuErsetzen, string neuerString)
        {
            using (WordprocessingDocument doc = WordprocessingDocument.Open(pfad, true))
            {
                SimplifyMarkupSettings settings = new SimplifyMarkupSettings
                {
                    NormalizeXml = true, // Merges Run's in a paragraph with similar formatting

                };
                MarkupSimplifier.SimplifyMarkup(doc, settings);

                //zuErsetzen = new XElement("Name", zuErsetzen).Value;
                var res = from bm in doc.MainDocumentPart.Document.Body.Descendants()
                          where bm.InnerText != string.Empty && bm.InnerText.Contains(zuErsetzen) && bm.HasChildren == false
                          select bm;
                // bm.InnerText.Contains(zuErsetzen)

                foreach (var item in res)
                {
                    item.InsertAfterSelf(new Text(item.InnerText.Replace(zuErsetzen, neuerString)));
                    item.Remove();
                }

                doc.Close();
            }
        }

(This code will work for normal documents with normal text in it!) (此代码适用于包含普通文本的普通文档!)

SOLUTION / ADDITION 2: If you want to replace text in textboxes , I had to do a little workaround. 解决方案/添加方式2:如果要替换文本框中的文本 ,我必须做一些解决方法。 Textboxes are declared as pictures, so the code above won't touch it. 文本框被声明为图片,因此上面的代码不会触及它。

I found an additional class ( link ) that searches even through textboxes. 我发现了一个甚至可以通过文本框搜索的附加类( 链接 )。 The ZIP-download includes an exmaple program, easy to understand. ZIP下载包含一个易于理解的示例程序。

This is happening because the Open XML word usually creates when a text contains special characters might look like : 发生这种情况是因为通常在文本包含特殊字符的情况下会创建Open XML词,如下所示:

  <w:r w:rsidRPr="00316587">
    <w:rPr>
      <w:rFonts w:ascii="Consolas" w:hAnsi="Consolas" w:eastAsia="Times New Roman" w:cs="Consolas" />
      <w:color w:val="823125" />
      <w:sz w:val="20" />
      <w:szCs w:val="20" />
      <w:lang w:eastAsia="en-GB" />
    </w:rPr>
    <w:t>[</w:t>
  </w:r>
  <w:proofErr w:type="gramStart" />
  <w:r w:rsidRPr="00316587">
    <w:rPr>
      <w:rFonts w:ascii="Consolas" w:hAnsi="Consolas" w:eastAsia="Times New Roman" w:cs="Consolas" />
      <w:color w:val="823125" />
      <w:sz w:val="20" />
      <w:szCs w:val="20" />
      <w:lang w:eastAsia="en-GB" />
    </w:rPr>
    <w:t>text-to-replace</w:t>
  </w:r>
  <w:proofErr w:type="gramEnd" />
  <w:r w:rsidRPr="00316587">
    <w:rPr>
      <w:rFonts w:ascii="Consolas" w:hAnsi="Consolas" w:eastAsia="Times New Roman" w:cs="Consolas" />
      <w:color w:val="823125" />
      <w:sz w:val="20" />
      <w:szCs w:val="20" />
      <w:lang w:eastAsia="en-GB" />
    </w:rPr>
    <w:t>]</w:t>
  </w:r>
</w:p>

The above shows open xml created for text [text-to-replace] . 上面显示了为文本[text-to-replace]创建的打开xml。 (Please note this might not always be the case, may be depends on the client you are using). (请注意,并非总是如此,可能取决于您使用的客户端)。

By the looks of your code doc.MainDocumentPart.Document.Body.Descendants() you are taking all the OpenXmlPart type Descendants for the whole body of the document and trying to replace the text iterating over one-by-one which leaves the actual text to be in one part and the special characters in two sperate parts. 通过代码doc.MainDocumentPart.Document.Body.Descendants()的外观,您将获取整个文档主体的所有OpenXmlPart类型Descendants,并尝试替换一个接一个地迭代的文本,从而OpenXmlPart实际文本分为一个部分,特殊字符分为两个部分。 Hence the code fails to acheieve the required. 因此,该代码无法满足要求。

There might be different ways to workaround this. 可能有不同的方法来解决此问题。

Solution: 解:

A nice (my preferred) solution would be to normalize the xml using Markup Simplifier from OpenXml Powertools , which will normalize the open xml markup to concatenate the text in a paragraph to simplify working programatically. 一个不错的(我偏爱的)解决方案是使用OpenXml Powertools的 Markup Simplifier标准化xml,这将标准化open xml标记以连接段落中的文本,从而简化程序设计工作。

Example code: 示例代码:

using (WordprocessingDocument doc =
            WordprocessingDocument.Open("Test.docx", true))
 {
      SimplifyMarkupSettings settings = new SimplifyMarkupSettings
      {
             NormalizeXml = true, // Merges Run's in a paragraph with similar formatting

       };
        MarkupSimplifier.SimplifyMarkup(doc, settings);
  }

Please Refer to my answer here for more info on using MarkupSimplifier 请参考我的回答这里有关使用MarkupSimplifier更多信息

Hope this helps :) 希望这可以帮助 :)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM