I am trying to extract the font style that is applied to a specific paragraph with Apache POI . The method getStyle()
returns null on the my XWPFParagraph
object.
Calling the method getCTR().getRPr().getRStyle()
on the first XWPFRun
object also returns null.
Calling the method getStyle().getDocDefaults().getRPrDefault()
on my XWPFDocument
object returns this:
<w:rPr>
<w:rFonts w:asciiTheme="minorHAnsi"/>
<w:sz w:val="22"/>
<w:szCs w:val="22"/>
<w:lang w:val="en-GB" w:eastAsia="en-US" w:bidi="ar-SA"/>
</w:rPr>
Where there are no w:ascii
attribute in the w:rFonts
tag. There is however a w:asciiTheme
attribute declared in the tag. How can I extract the information under the given theme with Apache POI?
The font style for this example is defined as the theme minorHAnsi
and the theme can be found in the theme1.xml file. But how can I for example extract the attribute under the a:latin
tag using Apache POI? Here is an sample from what it looks like in the theme1.xml file:
<a:minorFont>
<a:latin typeface="Calibri"/>
<a:ea typeface=""/>
<a:cs typeface=""/>
<a:font script="Jpan" typeface="MS 明朝"/>
<a:font script="Hang" typeface="맑은 고딕"/>
<a:font script="Hans" typeface="宋体"/>
...
<a:font script="Viet" typeface="Arial"/>
<a:font script="Uigh" typeface="Microsoft Uighur"/>
<a:font script="Geor" typeface="Sylfaen"/>
</a:minorFont>
If the question is how to get the /word/theme/theme1.xml
out of the *.docx
file system, then how to parse that and then get <a:minorFont><a:latin...
out of it, then this could be solved like so:
First do using methods of OPCPackage to get the package part /word/theme/theme1.xml
.
...
XWPFDocument document = new XWPFDocument(new FileInputStream("./WordExample.docx"));
OPCPackage oPCPackage = document.getPackage();
PackagePartName partName = PackagingURIHelper.createPartName("/word/theme/theme1.xml");
PackagePart themePart = oPCPackage.getPart(partName);
...
Then, if we have that PackagePart
, do parsing that into a org.openxmlformats.schemas.drawingml.x2006.main.ThemeDocument
. Then do using methods of org.openxmlformats.schemas.drawingml.x2006.main.ThemeDocument
to get the child elements of that.
...
ThemeDocument themeDocument = ThemeDocument.Factory.parse(themePart.getInputStream());
CTOfficeStyleSheet theme = themeDocument.getTheme();
CTBaseStyles themeElements = theme.getThemeElements();
CTFontScheme fontScheme = themeElements.getFontScheme();
CTFontCollection minorFont = fontScheme.getMinorFont();
CTTextFont latin = minorFont.getLatin();
...
Unfortunately there is no API
documentation of org.openxmlformats.schemas.*
public available. So, to get a such, we need downloading sources of ooxml-schemas
(for example from https://repo1.maven.org/maven2/org/apache/poi/ooxml-schemas/1.4/ ) and then using javadoc
to create a API
documentation from the sources.
Complete example:
import java.io.FileInputStream;
import org.apache.poi.xwpf.usermodel.*;
import org.apache.poi.openxml4j.opc.*;
import org.openxmlformats.schemas.drawingml.x2006.main.*;
public class WordGetThemeDocument {
public static void main(String[] args) throws Exception {
XWPFDocument document = new XWPFDocument(new FileInputStream("./WordExample.docx"));
OPCPackage oPCPackage = document.getPackage();
PackagePartName partName = PackagingURIHelper.createPartName("/word/theme/theme1.xml");
PackagePart themePart = oPCPackage.getPart(partName);
System.out.println(themePart);
ThemeDocument themeDocument = ThemeDocument.Factory.parse(themePart.getInputStream());
CTOfficeStyleSheet theme = themeDocument.getTheme();
CTBaseStyles themeElements = theme.getThemeElements();
CTFontScheme fontScheme = themeElements.getFontScheme();
CTFontCollection minorFont = fontScheme.getMinorFont();
CTTextFont latin = minorFont.getLatin();
System.out.println(latin);
String typeFace = latin.getTypeface();
System.out.println(typeFace);
document.close();
}
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.