[英]c# create word document with openXML : XML Parsing Error (when replacement string contains spaces)
我正在尝试使用openXML
在C#应用程序中使用单词模板创建单词文档。 到目前为止,这是我的代码:
DirectoryInfo tempDir = new DirectoryInfo(Server.MapPath("~\\Files\\WordTemplates\\"));
DirectoryInfo docsDir = new DirectoryInfo(Server.MapPath("~\\Files\\FinanceDocuments\\"));
string ype = "test Merge"; //if ype string contains spaces then I get this error
string sourceFile = tempDir + "\\PaymentOrderTemplate.dotx";
string destinationFile = docsDir + "\\" + "PaymentOrder.doc";
// Create a copy of the template file and open the copy
File.Copy(sourceFile, destinationFile, true);
// create key value pair, key represents words to be replace and
//values represent values in document in place of keys.
Dictionary<string, string> keyValues = new Dictionary<string, string>();
keyValues.Add("ype", ype);
SearchAndReplace(destinationFile, keyValues);
Process.Start(destinationFile);
以及SearchAndReplace
功能:
public static void SearchAndReplace(string document, Dictionary<string, string> dict)
{
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, true))
{
string docText = null;
using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
{
docText = sr.ReadToEnd();
}
foreach (KeyValuePair<string, string> item in dict)
{
Regex regexText = new Regex(item.Key);
docText = regexText.Replace(docText, item.Value);
}
using (StreamWriter sw = new StreamWriter(
wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
{
sw.Write(docText);
}
}
}
但是,当我尝试打开导出的文件时,出现此错误:
XML解析错误
位置:零件:/word/document.xml,第2行,第2142列
Document.xml第一行:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex" xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chartex" xmlns:cx2="http://schemas.microsoft.com/office/drawing/2015/10/21/chartex" xmlns:cx3="http://schemas.microsoft.com/office/drawing/2016/5/9/chartex" xmlns:cx4="http://schemas.microsoft.com/office/drawing/2016/5/10/chartex" xmlns:cx5="http://schemas.microsoft.com/office/drawing/2016/5/11/chartex" xmlns:cx6="http://schemas.microsoft.com/office/drawing/2016/5/12/chartex" xmlns:cx7="http://schemas.microsoft.com/office/drawing/2016/5/13/chartex" xmlns:cx8="http://schemas.microsoft.com/office/drawing/2016/5/14/chartex" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:aink="http://schemas.microsoft.com/office/drawing/2016/ink" xmlns:am3d="http://schemas.microsoft.com/office/drawing/2017/model3d" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:w16cid="http://schemas.microsoft.com/office/word/2016/wordml/cid" xmlns:w16se="http://schemas.microsoft.com/office/word/2015/wordml/symex" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 w15 w16se w16cid wp14">
<w:body>
<w:tbl>
<w:tblPr>
<w:tblW w:w="10348" w:ttest Merge="dxa"/>
<w:tblInd w:w="108" w:ttest Merge="dxa"/>
<w:tblBorders>
编辑我发现出现问题是因为我在Word模板中使用了mergefields。 如果我使用纯文本,它将起作用。 但是在这种情况下,它会很慢,因为它必须检查模板中的每个单词,如果匹配则替换它。 有可能以其他方式做到吗?
免责声明:您似乎正在使用OpenXML SDK,因为您的代码实际上与此处的代码相同: https : //msdn.microsoft.com/zh-cn/library/bb508261(v= office.12).aspx-我我一生中从未使用过此SDK,而我的答案是根据对发生的事情的有根据的猜测
您在此Word文档上执行的操作似乎正在影响文档中您不希望使用的部分。
我相信调用document.MainDocumentPart.GetStream()只是给您或多或少的原始直接访问文档XML的权限,然后您将其视为纯XML文件,将其作为文本处理,并执行一个列表直文字替换? 我认为这很可能是问题的原因,因为您打算编辑文档文本,但是在此过程中意外损坏了xml节点结构
作为示例,这是一个简单的HTML文档:
<html>
<head><title>Damage report</title></head>
<body>
<p>The soldier was shot once in the body and twice in the head</p>
</body>
</html>
您决定进行查找/替换以使士兵被枪杀的地点更为具体:
var html = File.ReadAllText(@"c:\my.html");
html = html.Replace("body", "chest");
html = html.Replace("head", "forehead");
File.WriteAllText(@"c:\my.html");
唯一的事情是,您的文档现在被破坏了:
<html>
<forehead><title>Damage report</title></forehead>
<chest>
<p>The soldier was shot once in the chest and twice in the forehead</p>
</chest>
</html>
浏览器无法再解析它了(嗯,我想它仍然是有效的,但是毫无意义),因为替换操作破坏了某些东西。
您将"ype"
替换为"test Merge"
但这似乎掩盖了"type"
一词的出现-看起来很可能会出现在XML属性或元素名称中的东西-并将其变为"ttest Merge"
。
为了正确地更改XML文档的节点文本的内容,应该将其从文本解析为XML文档对象模型表示形式,对节点进行迭代,对文本进行更改,然后将整个内容重新序列化为xml文本。 Office SDK似乎提供了执行此操作的方法,因为您可以将文档视为类对象实例的集合,并说出以下代码片段(同样来自MSDN):
// Create a Wordprocessing document.
using (WordprocessingDocument myDoc = WordprocessingDocument.Create(docName, WordprocessingDocumentType.Document))
{
// Add a new main document part.
MainDocumentPart mainPart = myDoc.AddMainDocumentPart();
//Create DOM tree for simple document.
mainPart.Document = new Document();
Body body = new Body();
Paragraph p = new Paragraph();
Run r = new Run();
Text t = new Text("Hello World!");
//Append elements appropriately.
r.Append(t);
p.Append(r);
body.Append(p);
mainPart.Document.Append(body);
// Save changes to the main document part.
mainPart.Document.Save();
}
您应该寻找另一种方式来访问文档元素,而不是使用流/直接的低级xml访问。 像这样的东西:
https://blogs.msdn.microsoft.com/brian_jones/2009/01/28/traversing-in-the-open-xml-dom/
https://www.gemboxsoftware.com/document/articles/find-replace-word-csharp
或者可能是从这样一个相关的SO问题开始的: 在OPENXML(添加的文件)中搜索和替换文本 (尽管您需要的答案可能在该问题内部的链接中)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.