使用 PDFBox 從 PDF 文檔中讀取特定頁面

Question

如何使用 PDFBox 從 PDF 文檔中讀取特定頁面（給定頁碼）？

Answer 1

這應該有效：

PDPage firstPage = (PDPage)doc.getAllPages().get( 0 );

如教程的書簽部分所示

2015 年更新，版本 2.0.0 快照

似乎這已被刪除並放回（？）。 getPage在 2.0.0 javadoc中。 要使用它：

PDDocument document = PDDocument.load(new File(filename));
PDPage doc = document.getPage(0);

getAllPages方法已重命名為getPages

PDPage page = (PDPage)doc.getPages().get( 0 );

Answer 2

//Using PDFBox library available from http://pdfbox.apache.org/  
//Writes pdf document of specific pages as a new pdf file

//Reads in pdf document  
PDDocument pdDoc = PDDocument.load(file);

//Creates a new pdf document  
PDDocument document = null;

//Adds specific page "i" where "i" is the page number and then saves the new pdf document   
try {   
    document = new PDDocument();   
    document.addPage((PDPage) pdDoc.getDocumentCatalog().getAllPages().get(i));   
    document.save("file path"+"new document title"+".pdf");  
    document.close();  
}catch(Exception e){}

Answer 3

以為我會在這里添加我的答案，因為我發現上面的答案很有用，但不完全是我需要的。

在我的場景中，我想單獨掃描每個頁面，查找關鍵字，如果出現該關鍵字，然后對該頁面執行某些操作（即復制或忽略它）。

我試圖在我的回答中簡單地替換公共變量等：

public void extractImages() throws Exception {
        try {
            String destinationDir = "OUTPUT DIR GOES HERE";
            // Load the pdf
            String inputPdf = "INPUT PDF DIR GOES HERE";
            document = PDDocument.load( inputPdf);
            List<PDPage> list = document.getDocumentCatalog().getAllPages();
            // Declare output fileName
            String fileName = "output.pdf";
            // Create output file
            PDDocument newDocument = new PDDocument();
            // Create PDFTextStripper - used for searching the page string
            PDFTextStripper textStripper=new PDFTextStripper(); 
            // Declare "pages" and "found" variable
            String pages= null; 
            boolean found = false;     
            // Loop through each page and search for "SEARCH STRING". If this doesn't exist
            // ie is the image page, then copy into the new output.pdf. 
            for(int i = 0; i < list.size(); i++) {
                // Set textStripper to search one page at a time 
                textStripper.setStartPage(i); 
                textStripper.setEndPage(i);             
                PDPage returnPage = null;
                // Fetch page text and insert into "pages" string
                pages = textStripper.getText(document); 
                found = pages.contains("SEARCH STRING");
                    if (i != 0) {
                            // if nothing is found, then copy the page across to new                     output pdf file
                        if (found == false) {
                            returnPage = list.get(i - 1); 
                            System.out.println("page returned is: " + returnPage);
                            System.out.println("Copy page");
                            newDocument.importPage(returnPage);
                        }
                    }
            }    
            newDocument.save(destinationDir + fileName);

            System.out.println(fileName + " saved");
         } 
         catch (Exception e) {
             e.printStackTrace();
             System.out.println("catch extract image");
         }
    }

Answer 4

你可以在 PDDocument 實例上使用 getPage 方法嗎

PDDocument pdDocument=null;
pdDocument = PDDocument.load(inputStream);
PDPage pdPage = pdDocument.getPage(0);

Answer 5

這是解決方案。 希望它能解決你的問題。

string fileName="C:\mypdf.pdf";
PDDocument doc = PDDocument.load(fileName);                   
PDFTextStripper stripper = new PDFTextStripper();
stripper.setStartPage(1);
stripper.setEndPage(2);
//above page number 1 to 2 will be parsed. for parsing only one page set both value same (ex:setStartPage(1);  setEndPage(1);)
string reslut = stripper.getText(doc);

doc.close();

Answer 6

將此添加到命令行調用：

ExtractText -startPage 1 -endPage 1 filename.pdf

將 1 更改為您需要的頁碼。

使用 PDFBox 從 PDF 文檔中讀取特定頁面

問題描述

6 個解決方案

解決方案1
31 已采納 2011-07-27 05:33:33

解決方案2
20 2012-07-13 20:19:49

解決方案3
4 2014-01-28 09:13:49

解決方案4
1 2015-11-14 06:31:51

解決方案5
1 2017-07-01 09:02:16

解決方案6
0 2011-07-27 05:29:45

使用 PDFBox 從 PDF 文檔中讀取特定頁面

問題描述

6 個解決方案

解決方案1 31 已采納 2011-07-27 05:33:33

解決方案2 20 2012-07-13 20:19:49

解決方案3 4 2014-01-28 09:13:49

解決方案4 1 2015-11-14 06:31:51

解決方案5 1 2017-07-01 09:02:16

解決方案6 0 2011-07-27 05:29:45

解決方案1
31 已采納 2011-07-27 05:33:33

解決方案2
20 2012-07-13 20:19:49

解決方案3
4 2014-01-28 09:13:49

解決方案4
1 2015-11-14 06:31:51

解決方案5
1 2017-07-01 09:02:16

解決方案6
0 2011-07-27 05:29:45