繁体   English   中英

如何使用多线程将一个 pdf 转换为多个 png 图像

[英]How to convert one pdf to multiple png images with multithreading

我用下面的方法将一张pdf转换成多张png图片:

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.rendering.ImageType;
import org.apache.pdfbox.rendering.PDFRenderer;
import org.imgscalr.Scalr;

public class ImgUtil {
    public static List<String> convertPDFPagesToImages(String sourceFilePath, String desFilePath){
        List<String> urlList = new ArrayList<>();
        try {
            File sourceFile = new File(sourceFilePath);
            File destinationFile = new File(desFilePath);
            if (!destinationFile.exists()) {
                destinationFile.mkdir();
                log.info("Folder Created ->:{} ", destinationFile.getAbsolutePath());
            }
            if (sourceFile.exists()) {
                log.info("Images copied to Folder Location: ", destinationFile.getAbsolutePath());
                PDDocument document = PDDocument.load(sourceFile);
                PDFRenderer pdfRenderer = new PDFRenderer(document);

                int numberOfPages = document.getNumberOfPages();
                log.info("Total files to be converting ->{} ", numberOfPages);

                String fileName = sourceFile.getName().replace(".pdf", "");
                String fileExtension = "png";
                /*
                 * 600 dpi give good image clarity but size of each image is 2x times of 300 dpi.
                 * Ex:  1. For 300dpi 04-Request-Headers_2.png expected size is 797 KB
                 *      2. For 600dpi 04-Request-Headers_2.png expected size is 2.42 MB
                 */
                int dpi = 300;// use less dpi for to save more space in harddisk. For professional usage you can use more than 300dpi

                for (int i = 0; i < numberOfPages; ++i) {
                    File outPutFile = new File(desFilePath + fileName +"_"+ (i+1) +"."+ fileExtension);
                    BufferedImage bImage = pdfRenderer.renderImageWithDPI(i, dpi, ImageType.RGB);
                    ImageIO.write(bImage, fileExtension, outPutFile);
                    urlList.add(outPutFile.getPath().replaceAll("\\\\", "/"));
                }

                document.close();
                log.info("Converted Images are saved at ->{} ", destinationFile.getAbsolutePath());
            } else {
                log.error(sourceFile.getName() +" File not exists");
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
        return urlList;
    }

    public static void main(String[] args) {
     
         convertPDFPagesToImages("D:\\tmp\\report\\pdfPath\\61199020100754118.pdf", "D:\\tmp\\report\\pdfPath\\");
         
    }
}

但是我发现当pdf页数比较多的时候,图片转换比较慢。 我考虑使用多线程来解析图像。 是否可以通过多个线程将pdf转换为图片或有类似的方法?

加速这种转换的一种简单方法是将图像写入拆分到后台线程。 在打开PDF之前设置一个executorService:

ExecutorService exec = Executors.newFixedThreadPool(1);

无需在同一个调用线程中写入图像,只需向服务提交一个新任务:

// ImageIO.write(bImage, fileExtension, outPutFile);
exec.submit(() -> write(bImage, fileExtension, outPutFile));

和 function 执行任务:

private static void write(BufferedImage image, String fileExtension, File file) {
    try {
        ImageIO.write(image, fileExtension, file);
    } catch (IOException e) {
        throw new UncheckedIOException(e);
    }
}

关闭 PDF 文档后确保执行器完成:

exec.shutdown();
exec.awaitTermination(365, TimeUnit.DAYS);

ImageIO.write使用多个线程可能不会使您受益,因为它是繁重的 IO 操作,但正如我在评论中所说,尝试写入一个大的ByteArrayOutputStream然后该文件也可能对您的特定硬件有所帮助。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM