简体   繁体   中英

how to extract text from ppt, pptx file except footer, slide number using apache poi?

I know how to extract text from ppt file using apache poi like this

        InputStream fis=new FileInputStream("abcd.ppt");
        HSLFSlideShow show=new HSLFSlideShow(fis);
        SlideShow ss=new SlideShow(show);
        Slide[] slides=ss.getSlides();
        StringBuilder builder = new StringBuilder();
        for(int x=0; x < slides.length; x++)
        {
            TextRun[] runs = slides[x].getTextRuns();
            for(int j=0; j<runs.length; j++) {
                TextRun run = runs[j];
                if(run != null) {
                    String text = run.getText();
                    builder.append(text);
                }
            }
        }

but it extracts all footer, slide number that I don't want

So how to extract text except footer and slide number

Thanks in advance

I would recommend that you look at the JPresentation. One of their examples shows how to extract all images and text from all slides: http://www.independentsoft.de/jpresentation/tutorial/exportallslides.html

The API seams to be very easy.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM