使用Itext從pdf檢索圖像時出錯

Question

我有一個要從中檢索圖像的PDF

注意：

在文檔中，這是RESULT變量

public static final String RESULT = "results/part4/chapter15/Img%s.%s";

我不明白為什么需要此圖像？我只想從我的PDF文件中提取圖像

所以現在當我使用MyImageRenderListener listener = new MyImageRenderListener(RESULT);

我收到錯誤消息：

results \\ part4 \\ chapter15 \\ Img16.jpg（系統找不到指定的路徑）

這是我擁有的代碼。

    package part4.chapter15;

    import java.io.IOException;


    import com.itextpdf.text.DocumentException;
    import com.itextpdf.text.pdf.PdfReader;
    import com.itextpdf.text.pdf.parser.PdfReaderContentParser;

    /**
     * Extracts images from a PDF file.
     */
    public class ExtractImages {

    /** The new document to which we've added a border rectangle. */
    public static final String RESOURCE = "resources/pdfs/samplefile.pdf";
    public static final String RESULT = "results/part4/chapter15/Img%s.%s";
    /**
     * Parses a PDF and extracts all the images.
     * @param src the source PDF
     * @param dest the resulting PDF
     */
    public void extractImages(String filename)
        throws IOException, DocumentException {
        PdfReader reader = new PdfReader(filename);
        PdfReaderContentParser parser = new PdfReaderContentParser(reader);
        MyImageRenderListener listener = new MyImageRenderListener(RESULT);
        for (int i = 1; i <= reader.getNumberOfPages(); i++) {
            parser.processContent(i, listener);
        }
        reader.close();
    }

    /**
     * Main method.
     * @param    args    no arguments needed
     * @throws DocumentException 
     * @throws IOException
     */
    public static void main(String[] args) throws IOException, DocumentException {
        new ExtractImages().extractImages(RESOURCE);
    }
}

Answer 1

您有兩個問題，第一個問題的答案是第二個問題的答案。

問題1：

您指的是：

public static final String RESULT = "results/part4/chapter15/Img%s.%s";

然后您問： 為什么需要此圖像？

這個問題是錯誤的，因為Img%s.%s不是圖像的文件名，而是圖像的文件名的模式。 解析時，iText將檢測PDF中的圖像。 這些圖像存儲在編號的對象（例如對象16）中，並且這些圖像可以以不同的格式（例如jpg，png，...）導出。

假設圖像存儲在對象16中，並且該圖像是jpg，則該模式將解析為Img16.jpg 。

問題2：

為什么會出現錯誤：

results \\ part4 \\ chapter15 \\ Img16.jpg（系統找不到指定的路徑）

在您的PDF中，對象16中存儲着一個jpg。您要iText使用以下路徑存儲該圖像： results\\part4\\chapter15\\Img16.jpg （如我對問題1的回答所述）。 但是：您的工作目錄沒有子目錄results\\part4\\chapter15\\ ，因此會引發IOException （或FileNotFoundException嗎？）。

一般問題是什么？

您已經復制/粘貼了我為我的書《 iText in Action-Second Edition》編寫的ExtractImages示例，但是：

您沒有讀過那本書，因此您不知道該代碼應該做什么。
您沒有在StackOverflow上告訴讀者該示例依賴於MyImageRenderer類，這是發生所有魔術的地方。

您如何解決您的問題？

選項1：

像這樣更改RESULT ：

public static final String RESULT = "Img%s.%s";

現在，圖像將存儲在您的工作目錄中。

選項2：

改編MyImageRenderer類，更具體地說，修改此方法：

public void renderImage(ImageRenderInfo renderInfo) {
    try {
        String filename;
        FileOutputStream os;
        PdfImageObject image = renderInfo.getImage();
        if (image == null) return;
        filename = String.format(path,
            renderInfo.getRef().getNumber(), image.getFileType());
        os = new FileOutputStream(filename);
        os.write(image.getImageAsBytes());
        os.flush();
        os.close();
    } catch (IOException e) {
        System.out.println(e.getMessage());
    }
}

每當遇到圖像時，iText都會調用此類。 它將ImageRenderInfo傳遞給此方法，該方法包含有關該圖像的大量信息。

在此實現中，我們將圖像字節存儲為文件。 這是我們創建該文件路徑的方式：

String.format(path,
     renderInfo.getRef().getNumber(), image.getFileType())

如您所見，以這種方式使用存儲在RESULT的模式：將第一次出現的%s替換為一個數字，將第二次出現的文件擴展名替換為一個數字。

您可以輕松地修改此方法，以便在需要時將圖像作為byte[]存儲在List 。

使用Itext從pdf檢索圖像時出錯

問題描述

1 個解決方案

解決方案1
2 已采納 2015-08-12 13:04:53

使用Itext從pdf檢索圖像時出錯

問題描述

1 個解決方案

解決方案1 2 已采納 2015-08-12 13:04:53

解決方案1
2 已采納 2015-08-12 13:04:53