简体   繁体   English

以xml格式保存Word文档

[英]Saving a word document in xml format

I was trying to save a word file in xml format and performing some operations on that xml file after parsing it. 我试图将一个Word文件保存为xml格式,并在解析后对该XML文件执行一些操作。

The data which i have in my word document was broken in different tags. 我在Word文档中拥有的数据被分成不同的标签。

example

If i have $date in my word document it was broken as $ and date in two tags.Also tlyadd is broken into two tags tly and add whereas tlyabcd remained in a single tag. 如果我的Word文档中有$ date,它将$和date分成两个标签.tlyadd也分为tly和add两个标签,而tlyabcd保留在一个标签中。

In another document these values are not broken into different tags. 在另一个文档中,这些值没有分成不同的标签。

I am not understanding on what basis are these values put in different tags. 我不明白这些值是在什么基础上放入不同的标记中。

I couldn't find anything on the word xml format on msdn. 我在msdn上的xml字词格式中找不到任何内容。

Can someone give me an explanation on why and on what basis is this done. 有人可以解释为什么以及在什么基础上进行解释。

Here is the document containing these values 这是包含这些值的文档

Let me know if it is unclear and needs more explanation 让我知道是否不清楚,需要更多说明

You shouldn't make any assumptions about whether text is in one run or several. 您不应对文本是一次运行还是多次运行做出任何假设。 There are no rules restricting the circumstances in which text may be split. 没有规则限制可以拆分文本的情况。

That said, there are various things which will force your text to be split across runs: 就是说,有很多事情会迫使您的文本在运行中分开:

  • spelling/grammar checking (probably happening with $date), which you can turn off 拼写/语法检查(可能与$ date一起发生),您可以将其关闭
  • formatting, for example, if half the word was bold 格式化,例如,如果单词的一半是粗体
  • revisions (different people changing the document at different times - rsid) 修订(不同人在不同时间更改文档-rsid)
  • change tracking etc 变更追踪等

You can/should preprocess your document to join up your runs. 您可以/应该预处理文档以加入运行。 See for example, docx4j's VariablePrepare.java 参见例如docx4j的VariablePrepare.java

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM