简体   繁体   English

如何识别和不读取 Docx4j 中的域代码?

[英]How do I identify and NOT read in field codes in Docx4j?

To get text from an object, currently I am using:要从对象获取文本,目前我正在使用:

String someText = TextUtils.extractText(obj, stringWriter);

Where obj is usually a Run, but can really be anything. obj 通常是 Run,但实际上可以是任何东西。 I am having an issue where I am reading in field codes such as:我在读取域代码时遇到了一个问题,例如:

 " PAGE   \* MERGEFORMAT "

when I really want to ignore it.当我真的想忽略它时。 Is there a way I can detect when a Text in a Run is a field code and ignore it?有没有一种方法可以检测到运行中的文本何时是字段代码并忽略它?

Thanks谢谢

You could pre-process the fields before you run TextUtils.extractText.您可以在运行 TextUtils.extractText 之前预处理字段。

One can imagine a little utility which you configure by saying, for each field-type, whether you wish to remove it entirely, or keep just the result (possibly updating it first).可以想象一个小实用程序,您可以为每个字段类型配置一个小实用程序,无论您是希望完全删除它,还是只保留结果(可能先更新它)。

docx4j doesn't include this right now, so below I sketch out what is involved. docx4j 现在不包括这个,所以下面我勾勒出所涉及的内容。

Note that there are 2 types of fields: simple and complex;请注意,有两种类型的字段:简单和复杂; see further http://webapp.docx4java.org/OnlineDemo/ecma376/WordML/XML.html进一步查看http://webapp.docx4java.org/OnlineDemo/ecma376/WordML/XML.html

There is code in docx4j for converting from simple to complex; docx4j中有简单到复杂的转换代码; see https://github.com/plutext/docx4j/blob/master/docx4j-core/src/main/java/org/docx4j/model/fields/FieldsPreprocessor.javahttps://github.com/plutext/docx4j/blob/master/docx4j-core/src/main/java/org/docx4j/model/fields/FieldsPreprocessor.java

Once your fields are in the "complex" form, for example:一旦您的字段处于“复杂”形式,例如:

<w:r>
  <w:fldChar w:fldCharType="begin"/>
</w:r>

<w:r>
  <w:instrText xml:space="preserve"> DATE </w:instrText>
</w:r>

<w:r>
  <w:fldChar w:fldCharType="separate"/>
</w:r>

<w:r>
  <w:t>12/31/2005</w:t>
</w:r>

<w:r>
  <w:fldChar w:fldCharType="end"/>
</w:r>

You can remove them, keeping just the result (ie the bit between "separate" and "end") if you want it.如果需要,您可以删除它们,只保留结果(即“分离”和“结束”之间的位)。

The representation docx4j creates is actually a bit easier to work with than the example above; docx4j 创建的表示实际上比上面的示例更容易使用; see https://github.com/plutext/docx4j/blob/master/docx4j-core/src/main/java/org/docx4j/model/fields/FieldRef.javahttps://github.com/plutext/docx4j/blob/master/docx4j-core/src/main/java/org/docx4j/model/fields/FieldRef.java

Note that there are quite a few different fields, see http://webapp.docx4java.org/OnlineDemo/ecma376/WordML/file_2.html请注意,有很多不同的字段,请参阅http://webapp.docx4java.org/OnlineDemo/ecma376/WordML/file_2.html

You'll want to know which ones are in your documents, and how you want to handle them.您会想知道您的文档中有哪些,以及您想如何处理它们。 For example, you might wish to remove a PAGE field entirely;例如,您可能希望完全删除 PAGE 字段; but a MERGEFIELD you may want to keep the result.但是 MERGEFIELD 您可能希望保留结果。 If you need to update it first, see https://github.com/plutext/docx4j/blob/master/docx4j-samples-docx4j/src/main/java/org/docx4j/samples/FieldsMailMerge.java如果您需要先更新它,请参阅https://github.com/plutext/docx4j/blob/master/docx4j-samples-docx4j/src/main/java/org/docx4j/samples/FieldsMailMerge.java

Here is how just the result is kept in the MAILMERGE case: https://github.com/plutext/docx4j/blob/master/docx4j-core/src/main/java/org/docx4j/model/fields/merge/MailMerger.java#L590以下是 MAILMERGE 案例中保留结果的方式: https : //github.com/plutext/docx4j/blob/master/docx4j-core/src/main/java/org/docx4j/model/fields/merge/MailMerger .java#L590

Its that easy because the XML is at that point in a known predictable pattern.就这么简单,因为此时 XML 处于一种已知的可预测模式中。

For DOCPROPERTY and DOCVARIABLE field processing examples, see https://github.com/plutext/docx4j/blob/master/docx4j-samples-docx4j/src/main/java/org/docx4j/samples/FieldUpdaterExample.java有关 DOCPROPERTY 和 DOCVARIABLE 字段处理示例,请参阅https://github.com/plutext/docx4j/blob/master/docx4j-samples-docx4j/src/main/java/org/docx4j/samples/FieldUpdaterExample.java

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM