使用pdfbox將PDF文件轉換為圖像時缺少文本

Question

我想將PDF頁面轉換為圖像文件。 使用java將PDF頁面轉換為圖像時，文本丟失。

轉換后我要轉換46_2.pdf的文件顯示為46_2.png

碼：

import java.awt.image.BufferedImage;
import java.io.File;
import java.util.List;

import javax.imageio.ImageIO;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;

public class ConvertPDFPageToImageWithoutText {
    public static void main(String[] args) {
        try {
            String oldPath = "C:/PDFCopy/46_2.pdf";
            File oldFile = new File(oldPath);
           if (oldFile.exists()) {

            PDDocument document = PDDocument.load(oldPath);
            List<PDPage> list = document.getDocumentCatalog().getAllPages();

            for (PDPage page : list) {
                BufferedImage image = page.convertToImage();
                File outputfile = new File("C:/PDFCopy/image.png");
                ImageIO.write(image, "png", outputfile);
                document.close();
            }

        }

    } catch (Exception e) {
        e.printStackTrace();
    }
}
}

Answer 1

由於您使用的是PDFBox，請嘗試使用PDFImageWriter.writeToImage而不是PDPage.convertToImage。 這篇文章似乎與你想要做的事情有關。

Answer 2

我有同樣的問題。 我發現了一篇文章（不幸的是記不住在哪里因為我讀了數百篇文章）。 有一位作者抱怨說，在將Java版本更新到7.21之后，PDFBox中出現了這樣的問題。 所以我使用的是7.17，它對我有用:)

Answer 3

使用最新版本的PDFBox（我使用2.0.9）並從此處添加JAI Image I / O依賴項。 這是JAVA 7上的示例運行代碼。

    public void pdfToImageConvertorUsingPdfBox(String inputPdfPath) throws Exception {
    File sourceFile = new File(inputPdfPath);
    String formatName = "png";
    if (sourceFile.exists()) {
        PDDocument document = PDDocument.load(sourceFile);
        PDFRenderer pdfRenderer = new PDFRenderer(document);
        int count = document.getNumberOfPages();

        for (int i = 0; i < count; i++) {
            BufferedImage image = pdfRenderer.renderImageWithDPI(i, 200, ImageType.RGB);
            String output = FilenameUtils.removeExtension(inputPdfPath) + "_" + (i + 1) + "." + formatName;
            ImageIO.write(image, formatName, new File(output));
        }
        document.close();
    } else {
        logger.error(sourceFile.getName() + " File not exists");
    }
}

使用pdfbox將PDF文件轉換為圖像時缺少文本

問題描述

3 個解決方案

解決方案1
2 2014-01-11 06:41:53

解決方案2
1 2014-01-13 14:24:09

解決方案3
0 2018-05-04 10:45:18

使用pdfbox將PDF文件轉換為圖像時缺少文本

問題描述

3 個解決方案

解決方案1 2 2014-01-11 06:41:53

解決方案2 1 2014-01-13 14:24:09

解決方案3 0 2018-05-04 10:45:18

解決方案1
2 2014-01-11 06:41:53

解決方案2
1 2014-01-13 14:24:09

解決方案3
0 2018-05-04 10:45:18