[英]how to extract text from ppt, pptx file except footer, slide number using apache poi?
[英]How to get pptx slide notes text using apache poi?
到目前为止,我只有一个用于从 ppt 幻灯片笔记中检索文本的工作代码
try {
FileInputStream is = new FileInputStream("C:\\sample\\test.ppt");
SlideShow ppt = new SlideShow(is);
Slide[] slide = ppt.getSlides();
for (int i = 0; i < slide.length; i++) {
System.out.println(i);
TextRun[] runs = slide[i].getNotesSheet().getTextRuns();
if (runs.length < 1) {
System.out.println("null");
} else {
for (TextRun run : runs) {
System.out.println(" > " + run.getText());
}
}
}
} catch (IOException ioe) {
}
但是如何从 pptx 幻灯片笔记中检索文本?
经过不断的反复试验,找到了解决方案。
try {
FileInputStream fis = new FileInputStream("C:\\sample\\sample.pptx");
XMLSlideShow pptxshow = new XMLSlideShow(fis);
XSLFSlide[] slide2 = pptxshow.getSlides();
for (int i = 0; i < slide2.length; i++) {
System.out.println(i);
try {
XSLFNotes mynotes = slide2[i].getNotes();
for (XSLFShape shape : mynotes) {
if (shape instanceof XSLFTextShape) {
XSLFTextShape txShape = (XSLFTextShape) shape;
for (XSLFTextParagraph xslfParagraph : txShape.getTextParagraphs()) {
System.out.println(xslfParagraph.getText());
}
}
}
} catch (Exception e) {
}
}
} catch (IOException e) {
}
已接受答案的更新。 这很有效,但是如果您启用了注释主文件中的其他部分,例如 header 或页码,那么您将获得您可能没有预料到的额外注释段落。 您可以使用以下代码仅限于实际注释:
try {
FileInputStream fis = new FileInputStream("C:\\sample\\sample.pptx");
XMLSlideShow pptxshow = new XMLSlideShow(fis);
XSLFSlide[] slide2 = pptxshow.getSlides();
for (int i = 0; i < slide2.length; i++) {
System.out.println(i);
try {
XSLFNotes mynotes = slide2[i].getNotes();
for (XSLFShape shape : mynotes) {
if (shape instanceof XSLFTextShape) {
XSLFTextShape txShape = (XSLFTextShape) shape;
// Look for the actual notes only ...
if (!txShape.getShapeName().contains("Notes Placeholder")) {
continue;
}
for (XSLFTextParagraph xslfParagraph : txShape.getTextParagraphs()) {
System.out.println(xslfParagraph.getText());
}
}
}
} catch (Exception e) {
}
}
} catch (IOException e) {
}
给出更好的解决方案。
try (FileInputStream fis = new FileInputStream("C:\\sample\\sample.pptx")) {
XMLSlideShow ppt = new XMLSlideShow(fis);
List<XSLFSlide> slides = ppt.getSlides();
for (XSLFSlide slide : slides) {
try {
XSLFNotes mynotes = slide.getNotes();
for (XSLFShape shape : mynotes) {
if (shape instanceof XSLFTextShape && Placeholder.BODY == ((XSLFTextShape) shape).getTextType()) {
XSLFTextShape txShape = (XSLFTextShape) shape;
System.out.println(txShape.getText());
break;
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
} catch (IOException e) {
e.printStackTrace();
}
与其他答案不同,此代码使用Placeholder.BODY == ((XSLFTextShape) shape).getTextType()
以便您只能获取备注文本。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.