简体   繁体   English

c#使用openXML创建Word文档:XML分析错误(当替换字符串包含空格时)

[英]c# create word document with openXML : XML Parsing Error (when replacement string contains spaces)

I am trying to create a word document using a word template in my C# application using openXML . 我正在尝试使用openXML在C#应用程序中使用单词模板创建单词文档。 Here is my code so far: 到目前为止,这是我的代码:

DirectoryInfo tempDir = new DirectoryInfo(Server.MapPath("~\\Files\\WordTemplates\\"));

DirectoryInfo docsDir = new DirectoryInfo(Server.MapPath("~\\Files\\FinanceDocuments\\"));

string ype = "test Merge"; //if ype string contains spaces then I get this error
string sourceFile = tempDir + "\\PaymentOrderTemplate.dotx";
string destinationFile = docsDir + "\\" + "PaymentOrder.doc";

// Create a copy of the template file and open the copy 
File.Copy(sourceFile, destinationFile, true);

// create key value pair, key represents words to be replace and 
//values represent values in document in place of keys.
Dictionary<string, string> keyValues = new Dictionary<string, string>();
keyValues.Add("ype", ype);                
SearchAndReplace(destinationFile, keyValues);
Process.Start(destinationFile);

And the SearchAndReplace funtion: 以及SearchAndReplace功能:

public static void SearchAndReplace(string document, Dictionary<string, string> dict)
{
    using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, true))
    {
        string docText = null;

        using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
        {
            docText = sr.ReadToEnd();
        }

        foreach (KeyValuePair<string, string> item in dict)
        {
            Regex regexText = new Regex(item.Key);
            docText = regexText.Replace(docText, item.Value);
        }

        using (StreamWriter sw = new StreamWriter(
                  wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
        {
            sw.Write(docText);
        }
    }
}

But when I try to open the exported file I get this error: 但是,当我尝试打开导出的文件时,出现此错误:

XML parsing error XML解析错误

Location: Part: /word/document.xml, line: 2, Column: 2142 位置:零件:/word/document.xml,第2行,第2142列

Document.xml first lines: Document.xml第一行:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>


<w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex" xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chartex" xmlns:cx2="http://schemas.microsoft.com/office/drawing/2015/10/21/chartex" xmlns:cx3="http://schemas.microsoft.com/office/drawing/2016/5/9/chartex" xmlns:cx4="http://schemas.microsoft.com/office/drawing/2016/5/10/chartex" xmlns:cx5="http://schemas.microsoft.com/office/drawing/2016/5/11/chartex" xmlns:cx6="http://schemas.microsoft.com/office/drawing/2016/5/12/chartex" xmlns:cx7="http://schemas.microsoft.com/office/drawing/2016/5/13/chartex" xmlns:cx8="http://schemas.microsoft.com/office/drawing/2016/5/14/chartex" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:aink="http://schemas.microsoft.com/office/drawing/2016/ink" xmlns:am3d="http://schemas.microsoft.com/office/drawing/2017/model3d" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:w16cid="http://schemas.microsoft.com/office/word/2016/wordml/cid" xmlns:w16se="http://schemas.microsoft.com/office/word/2015/wordml/symex" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 w15 w16se w16cid wp14">

<w:body>

<w:tbl>

<w:tblPr>

<w:tblW w:w="10348" w:ttest Merge="dxa"/>

<w:tblInd w:w="108" w:ttest Merge="dxa"/>

<w:tblBorders>

Edit I found out that the problem occured because I was using mergefields in the word template. 编辑我发现出现问题是因为我在Word模板中使用了mergefields。 If I use plain text it works. 如果我使用纯文本,它将起作用。 But in this case it will be slow because it has to check every single word in the template and if matches replace it. 但是在这种情况下,它会很慢,因为它必须检查模板中的每个单词,如果匹配则替换它。 Is it possible to do it in another way? 有可能以其他方式做到吗?

Disclaimer: You seem to be using the OpenXML SDK, because your code looks virtually identical to that found here: https://msdn.microsoft.com/en-us/library/bb508261(v=office.12).aspx - I've never in my life used this SDK and I'm basing this answer on an educated guess at what's happening 免责声明:您似乎正在使用OpenXML SDK,因为您的代码实际上与此处的代码相同: https : //msdn.microsoft.com/zh-cn/library/bb508261(v= office.12).aspx-我我一生中从未使用过此SDK,而我的答案是根据对发生的事情的有根据的猜测

It seems that the operation you're carrying out on this Word document is affecting parts of the document that you didn't intend. 您在此Word文档上执行的操作似乎正在影响文档中您不希望使用的部分。

I believe that calling document.MainDocumentPart.GetStream() just giving you more or less raw direct access to the XML of the document, and you're then treating it as a plain xml file, manipulating it as text, and carrying out a list of straight text replacements? 我相信调用document.MainDocumentPart.GetStream()只是给您或多或少的原始直接访问文档XML的权限,然后您将其视为纯XML文件,将其作为文本处理,并执行一个列表直文字替换? I think it's thus likely the cause of the problem because you're intending to edit document text, but accidentally damaging xml node structure in the process 我认为这很可能是问题的原因,因为您打算编辑文档文本,但是在此过程中意外损坏了xml节点结构

By way of an example, here is a simple HTML document: 作为示例,这是一个简单的HTML文档:

<html>
 <head><title>Damage report</title></head>
 <body>
  <p>The soldier was shot once in the body and twice in the head</p>
 </body>
</html>

You decide to run a find/replace to make the places the soldier was shot, a bit more specific: 您决定进行查找/替换以使士兵被枪杀的地点更为具体:

var html = File.ReadAllText(@"c:\my.html");
html = html.Replace("body", "chest");
html = html.Replace("head", "forehead");
File.WriteAllText(@"c:\my.html");

Only thing, your document is now ruined: 唯一的事情是,您的文档现在被破坏了:

<html>
 <forehead><title>Damage report</title></forehead>
 <chest>
  <p>The soldier was shot once in the chest and twice in the forehead</p>
 </chest>
</html>

A browser can't parse it (well, it's still valid I suppose, but it's meaningless) any more because the replacement operation broke some things. 浏览器无法再解析它了(嗯,我想它仍然是有效的,但是毫无意义),因为替换操作破坏了某些东西。

You're replacing "ype" with "test Merge" but this seems to be clobbering an occurrence of the word "type" - something that it seems pretty likely would appear in the XML attribute or element names - and turning it into "ttest Merge" . 您将"ype"替换为"test Merge"但这似乎掩盖了"type"一词的出现-看起来很可能会出现在XML属性或元素名称中的东西-并将其变为"ttest Merge"

To correctly change the content of an XML document's node texts, it should be parsed from text to an XML document object model representation, the nodes iterated, the texts altered, and the whole thing re-serialized back to xml text. 为了正确地更改XML文档的节点文本的内容,应该将其从文本解析为XML文档对象模型表示形式,对节点进行迭代,对文本进行更改,然后将整个内容重新序列化为xml文本。 Office SDK does seem to provide ways to do this, because you can treat a document like a collection of class object instances, and say things like this code snippet (also from MSDN): Office SDK似乎提供了执行此操作的方法,因为您可以将文档视为类对象实例的集合,并说出以下代码片段(同样来自MSDN):

// Create a Wordprocessing document. 
using (WordprocessingDocument myDoc = WordprocessingDocument.Create(docName, WordprocessingDocumentType.Document)) 
{ 
   // Add a new main document part. 
   MainDocumentPart mainPart = myDoc.AddMainDocumentPart(); 
   //Create DOM tree for simple document. 
   mainPart.Document = new Document(); 
   Body body = new Body(); 
   Paragraph p = new Paragraph(); 
   Run r = new Run(); 
   Text t = new Text("Hello World!"); 
   //Append elements appropriately. 
   r.Append(t); 
   p.Append(r); 
   body.Append(p); 
   mainPart.Document.Append(body); 
   // Save changes to the main document part. 
   mainPart.Document.Save(); 
}

You should be looking for another way, not using streams/direct low level xml access, to access the document elements. 您应该寻找另一种方式来访问文档元素,而不是使用流/直接的低级xml访问。 Something like these: 像这样的东西:

https://blogs.msdn.microsoft.com/brian_jones/2009/01/28/traversing-in-the-open-xml-dom/ 
https://www.gemboxsoftware.com/document/articles/find-replace-word-csharp

Or possibly starting with a related SO question like this: Search And Replace Text in OPENXML (Added file) (though the answer you need may be in the something linked inside this question) 或者可能是从这样一个相关的SO问题开始的: 在OPENXML(添加的文件)中搜索和替换文本 (尽管您需要的答案可能在该问题内部的链接中)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM