簡體   English   中英

c#使用openXML創建Word文檔:XML分析錯誤(當替換字符串包含空格時)

[英]c# create word document with openXML : XML Parsing Error (when replacement string contains spaces)

我正在嘗試使用openXML在C#應用程序中使用單詞模板創建單詞文檔。 到目前為止,這是我的代碼:

DirectoryInfo tempDir = new DirectoryInfo(Server.MapPath("~\\Files\\WordTemplates\\"));

DirectoryInfo docsDir = new DirectoryInfo(Server.MapPath("~\\Files\\FinanceDocuments\\"));

string ype = "test Merge"; //if ype string contains spaces then I get this error
string sourceFile = tempDir + "\\PaymentOrderTemplate.dotx";
string destinationFile = docsDir + "\\" + "PaymentOrder.doc";

// Create a copy of the template file and open the copy 
File.Copy(sourceFile, destinationFile, true);

// create key value pair, key represents words to be replace and 
//values represent values in document in place of keys.
Dictionary<string, string> keyValues = new Dictionary<string, string>();
keyValues.Add("ype", ype);                
SearchAndReplace(destinationFile, keyValues);
Process.Start(destinationFile);

以及SearchAndReplace功能:

public static void SearchAndReplace(string document, Dictionary<string, string> dict)
{
    using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, true))
    {
        string docText = null;

        using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
        {
            docText = sr.ReadToEnd();
        }

        foreach (KeyValuePair<string, string> item in dict)
        {
            Regex regexText = new Regex(item.Key);
            docText = regexText.Replace(docText, item.Value);
        }

        using (StreamWriter sw = new StreamWriter(
                  wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
        {
            sw.Write(docText);
        }
    }
}

但是,當我嘗試打開導出的文件時,出現此錯誤:

XML解析錯誤

位置:零件:/word/document.xml,第2行,第2142列

Document.xml第一行:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>


<w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex" xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chartex" xmlns:cx2="http://schemas.microsoft.com/office/drawing/2015/10/21/chartex" xmlns:cx3="http://schemas.microsoft.com/office/drawing/2016/5/9/chartex" xmlns:cx4="http://schemas.microsoft.com/office/drawing/2016/5/10/chartex" xmlns:cx5="http://schemas.microsoft.com/office/drawing/2016/5/11/chartex" xmlns:cx6="http://schemas.microsoft.com/office/drawing/2016/5/12/chartex" xmlns:cx7="http://schemas.microsoft.com/office/drawing/2016/5/13/chartex" xmlns:cx8="http://schemas.microsoft.com/office/drawing/2016/5/14/chartex" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:aink="http://schemas.microsoft.com/office/drawing/2016/ink" xmlns:am3d="http://schemas.microsoft.com/office/drawing/2017/model3d" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:w16cid="http://schemas.microsoft.com/office/word/2016/wordml/cid" xmlns:w16se="http://schemas.microsoft.com/office/word/2015/wordml/symex" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 w15 w16se w16cid wp14">

<w:body>

<w:tbl>

<w:tblPr>

<w:tblW w:w="10348" w:ttest Merge="dxa"/>

<w:tblInd w:w="108" w:ttest Merge="dxa"/>

<w:tblBorders>

編輯我發現出現問題是因為我在Word模板中使用了mergefields。 如果我使用純文本,它將起作用。 但是在這種情況下,它會很慢,因為它必須檢查模板中的每個單詞,如果匹配則替換它。 有可能以其他方式做到嗎?

免責聲明:您似乎正在使用OpenXML SDK,因為您的代碼實際上與此處的代碼相同: https : //msdn.microsoft.com/zh-cn/library/bb508261(v= office.12).aspx-我我一生中從未使用過此SDK,而我的答案是根據對發生的事情的有根據的猜測

您在此Word文檔上執行的操作似乎正在影響文檔中您不希望使用的部分。

我相信調用document.MainDocumentPart.GetStream()只是給您或多或少的原始直接訪問文檔XML的權限,然后您將其視為純XML文件,將其作為文本處理,並執行一個列表直文字替換? 我認為這很可能是問題的原因,因為您打算編輯文檔文本,但是在此過程中意外損壞了xml節點結構

作為示例,這是一個簡單的HTML文檔:

<html>
 <head><title>Damage report</title></head>
 <body>
  <p>The soldier was shot once in the body and twice in the head</p>
 </body>
</html>

您決定進行查找/替換以使士兵被槍殺的地點更為具體:

var html = File.ReadAllText(@"c:\my.html");
html = html.Replace("body", "chest");
html = html.Replace("head", "forehead");
File.WriteAllText(@"c:\my.html");

唯一的事情是,您的文檔現在被破壞了:

<html>
 <forehead><title>Damage report</title></forehead>
 <chest>
  <p>The soldier was shot once in the chest and twice in the forehead</p>
 </chest>
</html>

瀏覽器無法再解析它了(嗯,我想它仍然是有效的,但是毫無意義),因為替換操作破壞了某些東西。

您將"ype"替換為"test Merge"但這似乎掩蓋了"type"一詞的出現-看起來很可能會出現在XML屬性或元素名稱中的東西-並將其變為"ttest Merge"

為了正確地更改XML文檔的節點文本的內容,應該將其從文本解析為XML文檔對象模型表示形式,對節點進行迭代,對文本進行更改,然后將整個內容重新序列化為xml文本。 Office SDK似乎提供了執行此操作的方法,因為您可以將文檔視為類對象實例的集合,並說出以下代碼片段(同樣來自MSDN):

// Create a Wordprocessing document. 
using (WordprocessingDocument myDoc = WordprocessingDocument.Create(docName, WordprocessingDocumentType.Document)) 
{ 
   // Add a new main document part. 
   MainDocumentPart mainPart = myDoc.AddMainDocumentPart(); 
   //Create DOM tree for simple document. 
   mainPart.Document = new Document(); 
   Body body = new Body(); 
   Paragraph p = new Paragraph(); 
   Run r = new Run(); 
   Text t = new Text("Hello World!"); 
   //Append elements appropriately. 
   r.Append(t); 
   p.Append(r); 
   body.Append(p); 
   mainPart.Document.Append(body); 
   // Save changes to the main document part. 
   mainPart.Document.Save(); 
}

您應該尋找另一種方式來訪問文檔元素,而不是使用流/直接的低級xml訪問。 像這樣的東西:

https://blogs.msdn.microsoft.com/brian_jones/2009/01/28/traversing-in-the-open-xml-dom/ 
https://www.gemboxsoftware.com/document/articles/find-replace-word-csharp

或者可能是從這樣一個相關的SO問題開始的: 在OPENXML(添加的文件)中搜索和替換文本 (盡管您需要的答案可能在該問題內部的鏈接中)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM