简体   繁体   中英

Saving a word document in xml format

I was trying to save a word file in xml format and performing some operations on that xml file after parsing it.

The data which i have in my word document was broken in different tags.

example

If i have $date in my word document it was broken as $ and date in two tags.Also tlyadd is broken into two tags tly and add whereas tlyabcd remained in a single tag.

In another document these values are not broken into different tags.

I am not understanding on what basis are these values put in different tags.

I couldn't find anything on the word xml format on msdn.

Can someone give me an explanation on why and on what basis is this done.

Here is the document containing these values

Let me know if it is unclear and needs more explanation

You shouldn't make any assumptions about whether text is in one run or several. There are no rules restricting the circumstances in which text may be split.

That said, there are various things which will force your text to be split across runs:

  • spelling/grammar checking (probably happening with $date), which you can turn off
  • formatting, for example, if half the word was bold
  • revisions (different people changing the document at different times - rsid)
  • change tracking etc

You can/should preprocess your document to join up your runs. See for example, docx4j's VariablePrepare.java

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM