简体   繁体   English

使用Itext从pdf检索图像时出错

[英]Error while retrieving images from pdf using Itext

I have an existing PDF from which I want to retrieve images 我有一个要从中检索图像的PDF

NOTE: 注意:

In the Documentation, this is the RESULT variable 在文档中,这是RESULT变量

public static final String RESULT = "results/part4/chapter15/Img%s.%s";

I am not getting why this image is needed?I just want to extract the images from my PDF file 我不明白为什么需要此图像?我只想从我的PDF文件中提取图像

So Now when I use MyImageRenderListener listener = new MyImageRenderListener(RESULT); 所以现在当我使用MyImageRenderListener listener = new MyImageRenderListener(RESULT);

I am getting the error: 我收到错误消息:

results\\part4\\chapter15\\Img16.jpg (The system cannot find the path specified) results \\ part4 \\ chapter15 \\ Img16.jpg(系统找不到指定的路径)

This is the code that I am having. 这是我拥有的代码。

    package part4.chapter15;

    import java.io.IOException;


    import com.itextpdf.text.DocumentException;
    import com.itextpdf.text.pdf.PdfReader;
    import com.itextpdf.text.pdf.parser.PdfReaderContentParser;

    /**
     * Extracts images from a PDF file.
     */
    public class ExtractImages {

    /** The new document to which we've added a border rectangle. */
    public static final String RESOURCE = "resources/pdfs/samplefile.pdf";
    public static final String RESULT = "results/part4/chapter15/Img%s.%s";
    /**
     * Parses a PDF and extracts all the images.
     * @param src the source PDF
     * @param dest the resulting PDF
     */
    public void extractImages(String filename)
        throws IOException, DocumentException {
        PdfReader reader = new PdfReader(filename);
        PdfReaderContentParser parser = new PdfReaderContentParser(reader);
        MyImageRenderListener listener = new MyImageRenderListener(RESULT);
        for (int i = 1; i <= reader.getNumberOfPages(); i++) {
            parser.processContent(i, listener);
        }
        reader.close();
    }

    /**
     * Main method.
     * @param    args    no arguments needed
     * @throws DocumentException 
     * @throws IOException
     */
    public static void main(String[] args) throws IOException, DocumentException {
        new ExtractImages().extractImages(RESOURCE);
    }
}

You have two questions and the answer to the first question is the key to the answer of the second. 您有两个问题,第一个问题的答案是第二个问题的答案。

Question 1: 问题1:

You refer to: 您指的是:

public static final String RESULT = "results/part4/chapter15/Img%s.%s";

And you ask: why is this image needed? 然后您问: 为什么需要此图像?

That question is wrong, because Img%s.%s is not a filename of an image, it's a pattern of the filename of an image. 这个问题是错误的,因为Img%s.%s不是图像的文件名,而是图像的文件名的模式。 While parsing, iText will detect images in the PDF. 解析时,iText将检测PDF中的图像。 These images are stored in numbered objects (eg object 16) and these images can be exported in different formats (eg jpg, png,...). 这些图像存储在编号的对象(例如对象16)中,并且这些图像可以以不同的格式(例如jpg,png,...)导出。

Suppose that an image is stored in object 16 and that this image is a jpg, then the pattern will resolve to Img16.jpg . 假设图像存储在对象16中,并且该图像是jpg,则该模式将解析为Img16.jpg

Question 2: 问题2:

Why do I get an error: 为什么会出现错误:

results\\part4\\chapter15\\Img16.jpg (The system cannot find the path specified) results \\ part4 \\ chapter15 \\ Img16.jpg(系统找不到指定的路径)

In your PDF, there's a jpg stored in object 16. You are asking iText to store that image using this path: results\\part4\\chapter15\\Img16.jpg (as explained in my answer to Question 1 ). 在您的PDF中,对象16中存储着一个jpg。您要iText使用以下路径存储该图像: results\\part4\\chapter15\\Img16.jpg (如我对问题1的回答所述)。 However: you working directory doesn't have the subdirectories results\\part4\\chapter15\\ , hence an IOException (or a FileNotFoundException ?) is thrown. 但是:您的工作目录没有子目录results\\part4\\chapter15\\ ,因此会引发IOException (或FileNotFoundException吗?)。

What is the general problem? 一般问题是什么?

You have copy/pasted the ExtractImages example I wrote for my book "iText in Action - Second Edition", but: 您已经复制/粘贴了我为我的书《 iText in Action-Second Edition》编写的ExtractImages示例,但是:

  1. You didn't read that book, so you have no idea what that code is supposed to do. 您没有读过那本书,因此您不知道该代码应该做什么。
  2. You aren't telling the readers on StackOverflow that this example depends on the MyImageRenderer class, which is where all the magic happens. 您没有在StackOverflow上告诉读者该示例依赖于MyImageRenderer类,这是发生所有魔术的地方。

How can you solve your problem? 您如何解决您的问题?

Option 1: 选项1:

Change RESULT like this: 像这样更改RESULT

public static final String RESULT = "Img%s.%s";

Now the images will be stored in your working directory. 现在,图像将存储在您的工作目录中。

Option 2: 选项2:

Adapt the MyImageRenderer class, more specifically this method: 改编MyImageRenderer类,更具体地说,修改此方法:

public void renderImage(ImageRenderInfo renderInfo) {
    try {
        String filename;
        FileOutputStream os;
        PdfImageObject image = renderInfo.getImage();
        if (image == null) return;
        filename = String.format(path,
            renderInfo.getRef().getNumber(), image.getFileType());
        os = new FileOutputStream(filename);
        os.write(image.getImageAsBytes());
        os.flush();
        os.close();
    } catch (IOException e) {
        System.out.println(e.getMessage());
    }
}

iText calls this class whenever an image is encountered. 每当遇到图像时,iText都会调用此类。 It passed an ImageRenderInfo to this method that contains plenty of information about that image. 它将ImageRenderInfo传递给此方法,该方法包含有关该图像的大量信息。

In this implementation, we store the image bytes as a file. 在此实现中,我们将图像字节存储为文件。 This is how we create the path to that file: 这是我们创建该文件路径的方式:

String.format(path,
     renderInfo.getRef().getNumber(), image.getFileType())

As you can see, the pattern stored in RESULT is used in such a way that the first occurrence of %s is replaced with a number and the second occurrence with a file extension. 如您所见,以这种方式使用存储在RESULT的模式:将第一次出现的%s替换为一个数字,将第二次出现的文件扩展名替换为一个数字。

You could easily adapt this method so that it stores the images as byte[] in a List if that is what you want. 您可以轻松地修改此方法,以便在需要时将图像作为byte[]存储在List

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM