如何使用JAVA中的PDFBox从PDF创建图像

Question

I want to create an image from first page of PDF . 我想从PDF的第一页创建一个图像。 I am using PDFBox . 我正在使用PDFBox。 After researching in web , I have found the following snippet of code : 在网上研究后，我发现了以下代码片段：

public class ExtractImages
 {
    public static void main(String[] args)
    {
        ExtractImages obj = new ExtractImages();
            try 
            {
                obj.read_pdf();
            }

            catch (IOException ex)
            {
                System.out.println("" + ex);
            }

    }

    void read_pdf() throws IOException 
    {
            PDDocument document = null; 
            try 
            {
                document = PDDocument.load("H:\\ct1_answer.pdf");
            }
            catch (IOException ex)
            {
                System.out.println("" + ex);
            }

            List<PDPage>pages =  document.getDocumentCatalog().getAllPages();
            Iterator iter =  pages.iterator(); 

            int i =1;
            String name = null;

            while (iter.hasNext()) 
            {
                PDPage page = (PDPage) iter.next();
                PDResources resources = page.getResources();
                Map pageImages = resources.getImages();
                if (pageImages != null) 
                { 
                    Iterator imageIter = pageImages.keySet().iterator();
                    while (imageIter.hasNext()) {
                        String key = (String) imageIter.next();
                        PDXObjectImage image = (PDXObjectImage) pageImages.get(key);
                        image.write2file("H:\\image" + i);
                        i ++;
                    }
                }
            }

        }

 }

In the above code there is no error . 在上面的代码中没有错误。 But the output of this code is nothing . 但是这段代码的输出都没有。 I have expected that the above code will produce a series of image which will be saved in H drive . 我原以为上面的代码会生成一系列将保存在H盘中的图像。 But there is no image in that code produced from this code . 但是，从该代码生成的代码中没有图像。 Why ? 为什么？

Answer 1

Without trying to be rude, here is what the code you posted does inside its main working loop: 不试图粗鲁，这就是你在其主要工作循环中发布的代码：

PDPage page = (PDPage) iter.next();
PDResources resources = page.getResources();
Map pageImages = resources.getImages();

It's getting each page from the PDF file, getting the resources from the page, and extracting the embedded images . 它从PDF文件获取每个页面，从页面获取资源，并提取嵌入的图像 。 It then writes those to disk. 然后它将这些写入磁盘。

If you are to be a competent software developer you need to be able to research and read documentation. 如果您要成为一名称职的软件开发人员，您需要能够研究和阅读文档。 With Java, that means Javadocs. 使用Java，这意味着Javadocs。 Googling PDPage (or explicitly going to the apache site) turns up the Javadoc for PDPage . 谷歌搜索PDPage （或显式转到Apache网站）为PDPage打开了Javadoc 。

On that page you find two versions of the method convertToImage() for converting the PDPage to an image. 在该页面上，您可以找到用于将PDPage转换为图像的方法convertToImage()两个版本。 Problem solved. 问题解决了。

Except ... 除了 ...

Unfortunately, they return a java.awt.image.BufferedImage which based on other questions you have asked is a problem because it is not supported on the Android platform which is what you're working on. 不幸的是，他们返回一个java.awt.image.BufferedImage ，它基于您提出的其他问题是一个问题，因为Android平台不支持您正在处理的问题。

In short, you can't use Apache's PDFBox on Android to do what you're trying to do. 简而言之，你不能在Android上使用Apache的PDFBox来做你想做的事情。

Searching on StackOverflow you find this same question posed several times in different forms, which will lead you to this: https://stackoverflow.com/questions/4665957/pdf-parsing-library-for-android/4766335#4766335 with the following answer that would be of interest to you: https://stackoverflow.com/a/4779852/302916 在StackOverflow上搜索，您会发现同样的问题以不同的形式多次提出，这将引导您： https ： //stackoverflow.com/questions/4665957/pdf-parsing-library-for-android/4766335#4766335，其中包含以下内容：你会感兴趣的答案： https ： //stackoverflow.com/a/4779852/302916

Unfortunately even the one that the aforementioned answer says will work ... is not very user friendly; 不幸的是，即使是上述答案所说的那个也会起作用......对用户不是很友好; there's no "How to" or docs that I can find. 我找不到“如何”或文档。 It's also labeled as "alpha". 它也被标记为“alpha”。 This is probably not something for the feint hearted as it's going to require reading and understanding their code to even start using it. 这可能不适合虚假的事情，因为它需要阅读和理解他们的代码甚至开始使用它。

Answer 2

I copied your above code and added following libs to my buildpath in eclipse. 我复制了你上面的代码并在eclipse中将我的libs添加到我的buildpath中。 It is working. 这是工作。

Apache PDFBox 1.7.1 libs Apache PDFBox 1.7.1库

Commons Logging 1.1.1 libs Commons Logging 1.1.1 libs

如何使用JAVA中的PDFBox从PDF创建图像

问题描述

2 个解决方案

解决方案1
7 已采纳 2013-02-14 20:42:50

解决方案2
1 2013-02-14 06:49:47

如何使用JAVA中的PDFBox从PDF创建图像

问题描述

2 个解决方案

解决方案1 7 已采纳 2013-02-14 20:42:50

解决方案2 1 2013-02-14 06:49:47

解决方案1
7 已采纳 2013-02-14 20:42:50

解决方案2
1 2013-02-14 06:49:47