简体   繁体   English

使用OpenXML替换DOCX文件中的文本-奇怪的内容

[英]Use OpenXML to replace text in DOCX file - strange content

I'm trying to use the OpenXML SDK and the samples on Microsoft's pages to replace placeholders with real content in Word documents. 我正在尝试使用OpenXML SDK和Microsoft页面上的示例将占位符替换为Word文档中的实际内容。

It used to work as described here , but after editing the template file in Word adding headers and footers it stopped working. 它曾经按这里描述的那样工作,但是在Word中编辑模板文件并添加页眉和页脚后,它停止了工作。 I wondered why and some debugging showed me this: 我想知道为什么,并且通过一些调试向我展示了这一点:

在此处输入图片说明

Which is the content of texts in this piece of code: 这段代码中的texts内容是:

using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(DocumentFile, true))
{
    var texts = wordDoc.MainDocumentPart.Document.Body.Descendants<Text>().ToList();
}

So what I see here is that the body of the document is "fragmented", even though in Word the content looks like this: 所以我在这里看到的是文档的主体是“碎片化的”,即使在Word中内容看起来像这样:

在此处输入图片说明

Can somebody tell me how I can get around this? 有人可以告诉我如何解决这个问题吗?


I have been asked what I'm trying to achieve. 我被问到我要达到的目标。 Basically I want to replace user defined "placeholders" with real content. 基本上,我想用实际内容替换用户定义的“占位符”。 I want to treat the Word document like a template. 我想将Word文档视为模板。 The placeholders can be anything. 占位符可以是任何东西。 In my above example they look like {var:Template1} , but that's just something I'm playing with. 在我上面的示例中,它们看起来像{var:Template1} ,但这只是我正在玩的东西。 It could basically be any word. 基本上可以是任何单词。

So for example if the document contains the following paragraph: 因此,例如,如果文档包含以下段落:

Do not use the name USER_NAME 不要使用名称USER_NAME

The user should be able to replace the USER_NAME placeholder with the word admin for example, keeping the formatting intact. 例如,用户应能够使用admin一词替换USER_NAME占位符,同时保持格式不变。 The result should be 结果应该是

Do not use the name admin 不要使用名为admin

The problem I see with working on paragraph level, concatenating the content and then replacing the content of the paragraph, I fear I'm losing the formatting that should be kept as in 我在处理段落级别,连接内容然后替换段落的内容时遇到的问题,我担心我丢失了应该保留的格式

Do not use the name admin 不要使用管理员名称

Various things can fragment text runs. 各种各样的事情都会使文本运行片段化。 Most frequently proofing markup (as apparently is the case here, where there are "squigglies") or rsid (used to compare documents and track who edited what, when), as well as the "Go back" bookmark Word sets in the background. 最常用的校对标记(显然是这里的情况,这里有“弯曲”)或rsid(用于比较文档并跟踪谁编辑了什么内容,何时进行),以及在后台设置的“返回”书签Word。 These become readily apparent if you view the underlying WordOpenXML (using the Open XML SDK Productivity Tool, for example) in the document.xml "part". 如果您在document.xml“ part”中查看基础的WordOpenXML(例如,使用Open XML SDK生产率工具),这些将变得显而易见。

It usually helps to go an element level "higher". 通常有助于将元素级别提高到更高。 In this case, get the list of Paragraph descendants and from there get all the Text descendants and concatenate their InnerText. 在这种情况下,获取Paragraph后代的列表,然后从中获取所有Text后代并将其InnerText连接起来。

OpenXML is indeed fragmenting your text: OpenXML确实使您的文本碎片化:

I created a library that does exactly this : render a word template with the values from a JSON. 我创建了一个来执行此操作:使用JSON中的值呈现字模板。

From the documenation of docxtemplater : docxtemplater的文档

Why you should use a library for this 为什么要为此使用库

Docx is a zipped format that contains some xml. Docx是包含某些xml的压缩格式。 If you want to build a simple replace {tag} by value system, it can already become complicated, because the {tag} is internally separated into <w:t>{</w:t><w:t>tag</w:t><w:t>}</w:t> . 如果您要构建一个简单的用值系统替换{t​​ag}的方法,那么它已经变得很复杂,因为{tag}在内部被分隔为<w:t>{</w:t><w:t>tag</w:t><w:t>}</w:t> If you want to embed loops to iterate over an array, it becomes a real hassle. 如果您想嵌入循环以遍历数组,那么这将成为真正的麻烦。

The library basically will do the following to keep formatting : 该库基本上将执行以下操作以保持格式:

If the text is : 如果文字是:

<w:t>Hello</w:t>
<w:t>{name</w:t>
<w:t>} !</w:t>
<w:t>How are you ?</w:t>

The result would be : 结果将是:

<w:t>Hello</w:t>
<w:t>John !</w:t>
<w:t>How are you ?</w:t>

You also have to replace the tag by <w:t xml:space=\\"preserve\\"> to ensure that the space is not stripped out if they is any in your variables. 您还必须用<w:t xml:space=\\"preserve\\">替换标记,以确保如果变量中有空格,则不会删除空格。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM