简体   繁体   English

处理word文档java的问题

[英]Problem with processing word document java

i need to replace some fields in Word Document file in java.I am using Apache Poi library, i am using this code to replace words.我需要替换 java 中 Word 文档文件中的一些字段。我正在使用 Apache Poi 库,我正在使用此代码替换单词。

for (XWPFParagraph p : doc.getParagraphs()) {
                List<XWPFRun> runs = p.getRuns();
                if (runs != null) {
                    for (XWPFRun r : runs) {
                        String text = r.getText(0);
                        if (text != null)  {
                            System.out.println(text);
                            if (text.contains("[Title]")) {
                                text = text.replace("[Title]", wordBody.getTitle());//your content
                                r.setText(text, 0);
                            }if(text.contains("[Ref_no]")){
                                text=text.replace("[Ref_no]",wordBody.getRefNumber());
                                r.setText(text,0);
                            }
                            if(text.contains("[In_date]")){
                                text=text.replace("[In_date]",wordBody.getDate());
                                r.setText(text,0);
                            }if(text.contains("[FirstName]")){
                                text=text.replace("[FirstName]",wordBody.getFirstName());
                                r.setText(text,0);
                            }if(text.contains("[MiddleName]")){
                                text=text.replace("[MiddleName]",wordBody.getMiddleName());
                                r.setText(text,0);
                            }if(text.contains("[Vehicle_Type]")){
                                text=text.replace("[Vehicle_Type]",wordBody.getVehicleType());
                                r.setText(text,0);
                            }if(text.contains("[Reg_No]")){
                                text=text.replace("[Reg_No]",wordBody.getRegNumber());
                                r.setText(text,0);
                            }if(text.contains("[Location]")){
                                text=text.replace("[Location]",wordBody.getLocation());
                                r.setText(text,0);
                            }if(text.contains("[Issuer_Name]")){
                                text=text.replace("[Issuer_Name]",wordBody.getLocation());
                                r.setText(text,0);
                            }

                        }
                    }
                }
            }

So i mentioned that not all words a replaced and i didn't know how to fix it, then i printed out all text what i get and i got something like that所以我提到不是所有的单词都被替换了,我不知道如何修复它,然后我打印出了我得到的所有文本,我得到了类似的东西

This is to certify that [Title] [FirstName] [
MiddleName
] [Surname] has purchased [
Vehicle_Type
] 
having registration [
Reg_No
] from our [Location] Showroom.
Issued By,
[
Issuer

So i need replace fields in [] brackets and some of them as [Surname] a printed okay but some of them as [MIddleName] are changing line and i think that s way its not working.所以我需要替换 [] 括号中的字段,其中一些作为 [Surname] 打印好的,但其中一些作为 [MIddleName] 正在更改行,我认为这不起作用。

This - is my word text这是我的文字

在此处输入图像描述

I parsing docx file.我解析 docx 文件。 Thank you谢谢

If you have a look on your screen shot, you will see the red wavy line under MiddleName, Vehicle_Type and Reg_No.如果您查看屏幕截图,您将在 MiddleName、Vehicle_Type 和 Reg_No 下看到红色波浪线。 That means, that Word has detected a possible spelling problem here.这意味着, Word已在此处检测到可能的拼写问题。 This also is stored in the file and that's why the texts [MIddleName], [Vehicle_Type] and [Reg_No] are not together in one text run with their surrounding brackets.这也存储在文件中,这就是为什么文本 [MIddleName]、[Vehicle_Type] 和 [Reg_No] 不在一个文本中与它们的括号一起运行的原因。 The brackets are in their own text runs and also the texts together with the possible spelling problem marked.括号在它们自己的文本运行中,并且文本与可能的拼写问题一起标记。

This is a well known problem and some libraries already try solving this by detecting the text variables a more complex way than only searching them in text runs.这是一个众所周知的问题,一些库已经尝试通过检测文本变量来解决这个问题,这种方法比仅在文本运行中搜索它们更复杂。 There is templ4docx for example.例如有templ4docx

But my preferred way is another.但我首选的方式是另一种方式。 Word for a long time provides using text form fields. Word长期以来提供使用文本的表单域。 See Working with Form Fields .请参阅使用表单域 Note the legacy form fields are meant, not the ActiveX ones.请注意,旧表单字段是指旧表单字段,而不是 ActiveX 字段。

See Replace text templates inside .docx (Apache POI, Docx4j or other) for an example.有关示例,请参阅替换 .docx(Apache POI、Docx4j 或其他)中的文本模板

Modified example for your case:针对您的案例的修改示例:

WordTemplate.docx: WordTemplate.docx:

在此处输入图像描述

All gray fields are legacy text form fields inserted from developer tab.所有灰色字段都是从开发人员选项卡插入的旧文本表单字段。 In their Text Form Field Options the Bookmark: names are Text1 , Text2 , ... and default texts are set as needed.在他们的Text Form Field Options中, Bookmark:名称为Text1Text2 ,...,并且根据需要设置默认文本。

Code:代码:

import java.io.FileOutputStream;
import java.io.FileInputStream;

import org.apache.poi.xwpf.usermodel.*;

import org.apache.xmlbeans.XmlObject;
import org.apache.xmlbeans.XmlCursor;
import org.apache.xmlbeans.SimpleValue;
import javax.xml.namespace.QName;

public class WordReplaceTextInFormFields {

 private static void replaceFormFieldText(XWPFDocument document, String ffname, String text) {
  boolean foundformfield = false;
  for (XWPFParagraph paragraph : document.getParagraphs()) {
   for (XWPFRun run : paragraph.getRuns()) {
    XmlCursor cursor = run.getCTR().newCursor();
    cursor.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//w:fldChar/@w:fldCharType");
    while(cursor.hasNextSelection()) {
     cursor.toNextSelection();
     XmlObject obj = cursor.getObject();
     if ("begin".equals(((SimpleValue)obj).getStringValue())) {
      cursor.toParent();
      obj = cursor.getObject();
      obj = obj.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//w:ffData/w:name/@w:val")[0];
      if (ffname.equals(((SimpleValue)obj).getStringValue())) {
       foundformfield = true;
      } else {
       foundformfield = false;
      }
     } else if ("end".equals(((SimpleValue)obj).getStringValue())) {
      if (foundformfield) return;
      foundformfield = false;
     }
    }
    if (foundformfield && run.getCTR().getTList().size() > 0) {
     run.getCTR().getTList().get(0).setStringValue(text);
     foundformfield = false;
//System.out.println(run.getCTR());
    }
   }
  }
 }

 public static void main(String[] args) throws Exception {

  XWPFDocument document = new XWPFDocument(new FileInputStream("WordTemplate.docx"));

  replaceFormFieldText(document, "Text1", "Mrs.");
  replaceFormFieldText(document, "Text2", "Janis");
  replaceFormFieldText(document, "Text3", "Lyn");
  replaceFormFieldText(document, "Text4", "Joplin");
  replaceFormFieldText(document, "Text5", "Mercedes Benz");
  replaceFormFieldText(document, "Text6", "1234-56-789");
  replaceFormFieldText(document, "Text7", "Stuttgart");

  FileOutputStream out = new FileOutputStream("WordReplaceTextInFormFields.docx");
  document.write(out);
  out.close();
  document.close();
 }
}

This code is tested using apache poi 4.1.0 and needs the full jar of all of the schemas ooxml-schemas-1.4.jar as mentioned in FAQ-N10025 .此代码使用apache poi 4.1.0进行测试,需要FAQ-N10025中提到的所有模式ooxml-schemas-1.4.jar的完整 jar。

Result:结果:

在此处输入图像描述

Note the gray background of the text fields is only visible in GUI .请注意,文本字段的灰色背景仅在GUI中可见。 It will not be printed out by default.默认情况下不会打印出来。

Advantages:优点:

The form field content can only be formatted as whole.表单域内容只能整体格式化。 So form field content will never torn apart.所以表单域内容永远不会被撕裂。

The document can be protected so only filling the form fields is possible.可以保护文档,因此只能填写表单字段。 Then the template is usable as a form in Word GUI too.然后该模板也可用作Word GUI中的表单。

I like accepted answer above but if you're looking for a super-quick fix to prevent splitting text into multiple runs so that it can be recognized/read by the java program, do the following:我喜欢上面接受的答案,但如果您正在寻找一个超级快速的修复程序来防止将文本拆分为多个运行,以便 java 程序可以识别/读取它,请执行以下操作:

  1. Copy the text you want in a single run在一次运行中复制您想要的文本
  2. Paste it into notepad (removes all the docx formatting)将其粘贴到记事本中(删除所有 docx 格式)
  3. Copy and paste that text into the word document, hit Save.将该文本复制并粘贴到word文档中,点击保存。

It will now be inserted as one run.现在它将作为一次运行插入。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM