简体   繁体   中英

How to extract font family from OOXML using Apache POI?

I am trying to extract the font style that is applied to a specific paragraph with Apache POI . The method getStyle() returns null on the my XWPFParagraph object.

Calling the method getCTR().getRPr().getRStyle() on the first XWPFRun object also returns null.

Calling the method getStyle().getDocDefaults().getRPrDefault() on my XWPFDocument object returns this:

    <w:rPr>
      <w:rFonts w:asciiTheme="minorHAnsi"/>
      <w:sz w:val="22"/>
      <w:szCs w:val="22"/>
      <w:lang w:val="en-GB" w:eastAsia="en-US" w:bidi="ar-SA"/>
    </w:rPr>

Where there are no w:ascii attribute in the w:rFonts tag. There is however a w:asciiTheme attribute declared in the tag. How can I extract the information under the given theme with Apache POI?

The font style for this example is defined as the theme minorHAnsi and the theme can be found in the theme1.xml file. But how can I for example extract the attribute under the a:latin tag using Apache POI? Here is an sample from what it looks like in the theme1.xml file:

<a:minorFont>
   <a:latin typeface="Calibri"/>
   <a:ea typeface=""/>
   <a:cs typeface=""/>
   <a:font script="Jpan" typeface="MS 明朝"/>
   <a:font script="Hang" typeface="맑은 고딕"/>
   <a:font script="Hans" typeface="宋体"/>
                   ...
   <a:font script="Viet" typeface="Arial"/>
   <a:font script="Uigh" typeface="Microsoft Uighur"/>
   <a:font script="Geor" typeface="Sylfaen"/>
</a:minorFont>

If the question is how to get the /word/theme/theme1.xml out of the *.docx file system, then how to parse that and then get <a:minorFont><a:latin... out of it, then this could be solved like so:

First do using methods of OPCPackage to get the package part /word/theme/theme1.xml .

...
  XWPFDocument document = new XWPFDocument(new FileInputStream("./WordExample.docx"));
  OPCPackage oPCPackage = document.getPackage();
  PackagePartName partName = PackagingURIHelper.createPartName("/word/theme/theme1.xml");
  PackagePart themePart = oPCPackage.getPart(partName);
...

Then, if we have that PackagePart , do parsing that into a org.openxmlformats.schemas.drawingml.x2006.main.ThemeDocument . Then do using methods of org.openxmlformats.schemas.drawingml.x2006.main.ThemeDocument to get the child elements of that.

...
  ThemeDocument themeDocument = ThemeDocument.Factory.parse(themePart.getInputStream());
  CTOfficeStyleSheet theme = themeDocument.getTheme();
  CTBaseStyles themeElements = theme.getThemeElements();
  CTFontScheme fontScheme = themeElements.getFontScheme();
  CTFontCollection minorFont = fontScheme.getMinorFont();
  CTTextFont latin = minorFont.getLatin();
...

Unfortunately there is no API documentation of org.openxmlformats.schemas.* public available. So, to get a such, we need downloading sources of ooxml-schemas (for example from https://repo1.maven.org/maven2/org/apache/poi/ooxml-schemas/1.4/ ) and then using javadoc to create a API documentation from the sources.

Complete example:

import java.io.FileInputStream;

import org.apache.poi.xwpf.usermodel.*;
import org.apache.poi.openxml4j.opc.*;
import org.openxmlformats.schemas.drawingml.x2006.main.*;

public class WordGetThemeDocument {

 public static void main(String[] args) throws Exception {

  XWPFDocument document = new XWPFDocument(new FileInputStream("./WordExample.docx"));
  OPCPackage oPCPackage = document.getPackage();
  PackagePartName partName = PackagingURIHelper.createPartName("/word/theme/theme1.xml");
  PackagePart themePart = oPCPackage.getPart(partName);
System.out.println(themePart);

  ThemeDocument themeDocument = ThemeDocument.Factory.parse(themePart.getInputStream());
  CTOfficeStyleSheet theme = themeDocument.getTheme();
  CTBaseStyles themeElements = theme.getThemeElements();
  CTFontScheme fontScheme = themeElements.getFontScheme();
  CTFontCollection minorFont = fontScheme.getMinorFont();
  CTTextFont latin = minorFont.getLatin();
System.out.println(latin);
  String typeFace = latin.getTypeface();
System.out.println(typeFace);

  document.close();
 }

}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM