簡體   English   中英

如何使用多線程將一個 pdf 轉換為多個 png 圖像

[英]How to convert one pdf to multiple png images with multithreading

我用下面的方法將一張pdf轉換成多張png圖片:

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.rendering.ImageType;
import org.apache.pdfbox.rendering.PDFRenderer;
import org.imgscalr.Scalr;

public class ImgUtil {
    public static List<String> convertPDFPagesToImages(String sourceFilePath, String desFilePath){
        List<String> urlList = new ArrayList<>();
        try {
            File sourceFile = new File(sourceFilePath);
            File destinationFile = new File(desFilePath);
            if (!destinationFile.exists()) {
                destinationFile.mkdir();
                log.info("Folder Created ->:{} ", destinationFile.getAbsolutePath());
            }
            if (sourceFile.exists()) {
                log.info("Images copied to Folder Location: ", destinationFile.getAbsolutePath());
                PDDocument document = PDDocument.load(sourceFile);
                PDFRenderer pdfRenderer = new PDFRenderer(document);

                int numberOfPages = document.getNumberOfPages();
                log.info("Total files to be converting ->{} ", numberOfPages);

                String fileName = sourceFile.getName().replace(".pdf", "");
                String fileExtension = "png";
                /*
                 * 600 dpi give good image clarity but size of each image is 2x times of 300 dpi.
                 * Ex:  1. For 300dpi 04-Request-Headers_2.png expected size is 797 KB
                 *      2. For 600dpi 04-Request-Headers_2.png expected size is 2.42 MB
                 */
                int dpi = 300;// use less dpi for to save more space in harddisk. For professional usage you can use more than 300dpi

                for (int i = 0; i < numberOfPages; ++i) {
                    File outPutFile = new File(desFilePath + fileName +"_"+ (i+1) +"."+ fileExtension);
                    BufferedImage bImage = pdfRenderer.renderImageWithDPI(i, dpi, ImageType.RGB);
                    ImageIO.write(bImage, fileExtension, outPutFile);
                    urlList.add(outPutFile.getPath().replaceAll("\\\\", "/"));
                }

                document.close();
                log.info("Converted Images are saved at ->{} ", destinationFile.getAbsolutePath());
            } else {
                log.error(sourceFile.getName() +" File not exists");
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
        return urlList;
    }

    public static void main(String[] args) {
     
         convertPDFPagesToImages("D:\\tmp\\report\\pdfPath\\61199020100754118.pdf", "D:\\tmp\\report\\pdfPath\\");
         
    }
}

但是我發現當pdf頁數比較多的時候,圖片轉換比較慢。 我考慮使用多線程來解析圖像。 是否可以通過多個線程將pdf轉換為圖片或有類似的方法?

加速這種轉換的一種簡單方法是將圖像寫入拆分到后台線程。 在打開PDF之前設置一個executorService:

ExecutorService exec = Executors.newFixedThreadPool(1);

無需在同一個調用線程中寫入圖像,只需向服務提交一個新任務:

// ImageIO.write(bImage, fileExtension, outPutFile);
exec.submit(() -> write(bImage, fileExtension, outPutFile));

和 function 執行任務:

private static void write(BufferedImage image, String fileExtension, File file) {
    try {
        ImageIO.write(image, fileExtension, file);
    } catch (IOException e) {
        throw new UncheckedIOException(e);
    }
}

關閉 PDF 文檔后確保執行器完成:

exec.shutdown();
exec.awaitTermination(365, TimeUnit.DAYS);

ImageIO.write使用多個線程可能不會使您受益,因為它是繁重的 IO 操作,但正如我在評論中所說,嘗試寫入一個大的ByteArrayOutputStream然后該文件也可能對您的特定硬件有所幫助。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM