简体   繁体   English

使用 Apache poi 从 docx 获取文本样式

[英]Getting text style from docx using Apache poi

I'm trying to get the style information from an MS docx file, I have no problem writing file content with added styles like bold, italic.我正在尝试从 MS docx 文件中获取样式信息,我可以使用粗体、斜体等添加样式编写文件内容。 font size etc, but reading the file content and getting the style information is not so clear.字体大小等,但读取文件内容和获取样式信息不是那么清楚。 I've tried using XWPFDocument, this API does not seem to have the ability to read the styles.我试过使用 XWPFDocument,这个 API 似乎没有读取样式的能力。 I'm now trying XWPFWordExtractor which seems a bit more promising but I'm still stuck getting the style information for the text.我现在正在尝试 XWPFWordExtractor,它看起来更有希望,但我仍然无法获取文本的样式信息。

The type of content I reading looks similar to the following.我阅读的内容类型类似于以下内容。

"Hello, this is bold text and this is italic text abd this is bold-italic text " “您好,这是粗体文字,这是斜体文字,这是粗斜体文字

Any pointers to an example would be great.任何指向示例的指针都会很棒。

Okay, so based on the comments from Gagravarr, the solution is below, exactly as I wanted.好的,根据 Gagravarr 的评论,解决方案如下,正是我想要的。 So basically Gagravarr answered the question but I'm not sure how apart from saying it hear to give him credit.所以基本上 Gagravarr 回答了这个问题,但我不知道除了说它听到给他信用之外如何。

for (XWPFParagraph paragraph : docx.getParagraphs()) {
                int pos = 0;
                for (XWPFRun run : paragraph.getRuns()) {
                    System.out.println("Current run IsBold : " + run.isBold());
                    System.out.println("Current run IsItalic : " + run.isItalic());
                    for (char c : run.text().toCharArray()) {

                        System.out.print(c);
                        pos++;
                    }
                    System.out.println();
                }
            }

` `

Output below下面的输出

Current run IsBold : false Current run IsItalic : false "Hello, this is Current run IsBold : true Current run IsItalic : false bold text Current run IsBold : false Current run IsItalic : false and this is Current run IsBold : false Current run IsItalic : true italic text Current run IsBold : false Current run IsItalic : false a Current run IsBold : false Current run IsItalic : false n Current run IsBold : false Current run IsItalic : false d this is Current run IsBold : true Current run IsItalic : true bold-italic text Current run IsBold : false Current run IsItalic : false "

这是获取粗体属性的简单技巧。

run.getCTR().xmlText().contains("<w:bw:val=\\"1\\"/>") return true if bold otherwise false.

I gave up trying to use Apache poi, I found another lib called docx4j, this seems to do what I need, the properties I want to look at a now available, once the docx file is loaded you can view the content of the file in an xml format like below.我放弃了尝试使用 Apache poi,我找到了另一个名为 docx4j 的库,这似乎满足了我的需要,我想查看的属性现在可用,一旦加载了 docx 文件,您就可以在如下所示的 xml 格式。

` `

<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:ns27="http://schemas.openxmlformats.org/schemaLibrary/2006/main" xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" mc:Ignorable="w14 wp14">
   <w:body>
      <w:p w:rsidR="009A66AB" w:rsidRDefault="000F4AD1">
         <w:r>
            <w:rPr>
               <w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/>
               <w:color w:val="222222"/>
               <w:sz w:val="23"/>
               <w:szCs w:val="23"/>
               <w:shd w:val="clear" w:color="auto" w:fill="FFFFFF"/>
            </w:rPr>
            <w:t>&quot;Hello, this is</w:t>
         </w:r>
         <w:r>
            <w:rPr>
               <w:rStyle w:val="apple-converted-space"/>
               <w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/>
               <w:color w:val="222222"/>
               <w:sz w:val="23"/>
               <w:szCs w:val="23"/>
               <w:shd w:val="clear" w:color="auto" w:fill="FFFFFF"/>
            </w:rPr>
            <w:t> </w:t>
         </w:r>
         <w:r>
            <w:rPr>
               <w:rStyle w:val="Strong"/>
               <w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/>
               <w:color w:val="222222"/>
               <w:sz w:val="23"/>
               <w:szCs w:val="23"/>
               <w:bdr w:val="none" w:color="auto" w:sz="0" w:space="0" w:frame="true"/>
               <w:shd w:val="clear" w:color="auto" w:fill="FFFFFF"/>
            </w:rPr>
            <w:t>bold text</w:t>
         </w:r>
         <w:r>
            <w:rPr>
               <w:rStyle w:val="apple-converted-space"/>
               <w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/>
               <w:color w:val="222222"/>
               <w:sz w:val="23"/>
               <w:szCs w:val="23"/>
               <w:shd w:val="clear" w:color="auto" w:fill="FFFFFF"/>
            </w:rPr>
            <w:t> </w:t>
         </w:r>
         <w:r>
            <w:rPr>
               <w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/>
               <w:color w:val="222222"/>
               <w:sz w:val="23"/>
               <w:szCs w:val="23"/>
               <w:shd w:val="clear" w:color="auto" w:fill="FFFFFF"/>
            </w:rPr>
            <w:t>and this is</w:t>
         </w:r>
         <w:r>
            <w:rPr>
               <w:rStyle w:val="apple-converted-space"/>
               <w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/>
               <w:color w:val="222222"/>
               <w:sz w:val="23"/>
               <w:szCs w:val="23"/>
               <w:shd w:val="clear" w:color="auto" w:fill="FFFFFF"/>
            </w:rPr>
            <w:t> </w:t>
         </w:r>
         <w:r>
            <w:rPr>
               <w:rStyle w:val="Emphasis"/>
               <w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/>
               <w:color w:val="222222"/>
               <w:sz w:val="23"/>
               <w:szCs w:val="23"/>
               <w:bdr w:val="none" w:color="auto" w:sz="0" w:space="0" w:frame="true"/>
               <w:shd w:val="clear" w:color="auto" w:fill="FFFFFF"/>
            </w:rPr>
            <w:t>italic text</w:t>
         </w:r>
         <w:r>
            <w:rPr>
               <w:rStyle w:val="apple-converted-space"/>
               <w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/>
               <w:color w:val="222222"/>
               <w:sz w:val="23"/>
               <w:szCs w:val="23"/>
               <w:shd w:val="clear" w:color="auto" w:fill="FFFFFF"/>
            </w:rPr>
            <w:t> </w:t>
         </w:r>
         <w:r>
            <w:rPr>
               <w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/>
               <w:color w:val="222222"/>
               <w:sz w:val="23"/>
               <w:szCs w:val="23"/>
               <w:shd w:val="clear" w:color="auto" w:fill="FFFFFF"/>
            </w:rPr>
            <w:t>an</w:t>
         </w:r>
         <w:r>
            <w:rPr>
               <w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/>
               <w:color w:val="222222"/>
               <w:sz w:val="23"/>
               <w:szCs w:val="23"/>
               <w:shd w:val="clear" w:color="auto" w:fill="FFFFFF"/>
            </w:rPr>
            <w:t>d this is</w:t>
         </w:r>
         <w:r>
            <w:rPr>
               <w:rStyle w:val="apple-converted-space"/>
               <w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/>
               <w:color w:val="222222"/>
               <w:sz w:val="23"/>
               <w:szCs w:val="23"/>
               <w:shd w:val="clear" w:color="auto" w:fill="FFFFFF"/>
            </w:rPr>
            <w:t> </w:t>
         </w:r>
         <w:r>
            <w:rPr>
               <w:rStyle w:val="Emphasis"/>
               <w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/>
               <w:b/>
               <w:bCs/>
               <w:color w:val="222222"/>
               <w:sz w:val="23"/>
               <w:szCs w:val="23"/>
               <w:bdr w:val="none" w:color="auto" w:sz="0" w:space="0" w:frame="true"/>
               <w:shd w:val="clear" w:color="auto" w:fill="FFFFFF"/>
            </w:rPr>
            <w:t>bold-italic text</w:t>
         </w:r>
         <w:r>
            <w:rPr>
               <w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/>
               <w:color w:val="222222"/>
               <w:sz w:val="23"/>
               <w:szCs w:val="23"/>
               <w:shd w:val="clear" w:color="auto" w:fill="FFFFFF"/>
            </w:rPr>
            <w:t>&quot;</w:t>
         </w:r>
      </w:p>
      <w:sectPr w:rsidR="009A66AB">
         <w:pgSz w:w="11906" w:h="16838"/>
         <w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="708" w:footer="708" w:gutter="0"/>
         <w:cols w:space="708"/>
         <w:docGrid w:linePitch="360"/>
      </w:sectPr>
   </w:body>
</w:document>

` `

你可以使用paragraph.getCTP().getPPr().getRPr().isSetB()

I found a very nice way to copy styles from one document to another.我发现了一种将样式从一个文档复制到另一个文档的好方法。 It is not as direct as I would have hoped but it works.它不像我希望的那样直接,但它有效。

  1. Rename the source word document to type zip将源word文档重命名为zip
  2. Extract the contents提取内容
  3. Copy styles.xml into a string constant or read the file将styles.xml 复制到字符串常量中或读取文件
  4. Copy the styles into your output document with the following code使用以下代码将样式复制到输出文档中

    public void copyStylesXml(String stylesXmlString) { try { CTStyles ctStyle = CTStyles.Factory.parse(stylesXmlString); XWPFStyles styles = getDoc().createStyles(); styles.setStyles(ctStyle); } catch (Exception e) { log.warn(e, e); } }

The same approach works for copying list formats相同的方法适用于复制列表格式

Here is a very good way to copy styles from another document.这是从另一个文档复制样式的非常好的方法。 A little background;一点背景; a docx file is really a zip file of a number of xml files including styles.xml. docx 文件实际上是许多 xml 文件(包括 style.xml)的 zip 文件。 In the following code sample I read numberin.xml, parse it into a CTStyles object then set it in the current document.在以下代码示例中,我读取 numberin.xml,将其解析为 CTStyles 对象,然后将其设置在当前文档中。 Here is most of the code.这是大部分代码。 You can use the same approach to copy numbering.xml for your Word numbering.您可以使用相同的方法为 Word 编号复制 numbering.xml。

// copy an existing style.xml document into this document to get styles
public void copyStylesFromDocument(String documentFileName) {
    log.debug("fileName " + documentFileName);
    try {
        InputStream is = CertificationReportHelper.getInputStreamFromZipFile(documentFileName, FILE_NAME_STYLES);
        CTStyles ctStyle = CTStyles.Factory.parse(is);
        XWPFStyles styles = getDoc().createStyles();
        styles.setStyles(ctStyle);
        log.info("Styles copied from file " + FILE_NAME_STYLES + " in document" + documentFileName);
    } catch (Exception e) {
        String msg = "Error copying styles from file " + FILE_NAME_STYLES + " in document" + documentFileName;
        addErrorMessage(msg, e);
        log.debug(e, e);
    }
    @SuppressWarnings("resource") // closing stream causes input stream to close and operation fails
public static InputStream getInputStreamFromZipFile(String zipFileName, String containedFile) {
    InputStream is = null;
    ZipFile zfile = null;
    try {
        zfile = new ZipFile(zipFileName);
        ZipEntry entry = zfile.getEntry(containedFile);
        log.trace(entry);
        if (entry != null) {
            is = zfile.getInputStream(entry);
            log.trace("created input stream  for file " + containedFile + " from zip file" + zipFileName);
        } else {
            String msg = "Error getting input stream for file " + containedFile + " from zip file " + zipFileName;
            // closing stream causes input stream to close and operation fails
            throw new ApplicationRuntimeException(msg);
        }
    } catch (Exception e) {
        String msg = "Error getting input stream for file " + containedFile + " from zip file " + zipFileName + "  Message:"
                + e.getMessage();
        log.warn("*** Throwing exception " + msg);
        throw new ApplicationRuntimeException(msg, e);
    } finally {
        // closing stream causes input stream to close and operation fails
        // try {
        // zfile.close();
        // } catch (IOException e) {
        // log.warn("Catching exception "+e+" closing zip file "+zipFileName);
        // }
    }
    return is;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM