[英]How to convert one pdf to multiple png images with multithreading
我用下面的方法將一張pdf轉換成多張png圖片:
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.rendering.ImageType;
import org.apache.pdfbox.rendering.PDFRenderer;
import org.imgscalr.Scalr;
public class ImgUtil {
public static List<String> convertPDFPagesToImages(String sourceFilePath, String desFilePath){
List<String> urlList = new ArrayList<>();
try {
File sourceFile = new File(sourceFilePath);
File destinationFile = new File(desFilePath);
if (!destinationFile.exists()) {
destinationFile.mkdir();
log.info("Folder Created ->:{} ", destinationFile.getAbsolutePath());
}
if (sourceFile.exists()) {
log.info("Images copied to Folder Location: ", destinationFile.getAbsolutePath());
PDDocument document = PDDocument.load(sourceFile);
PDFRenderer pdfRenderer = new PDFRenderer(document);
int numberOfPages = document.getNumberOfPages();
log.info("Total files to be converting ->{} ", numberOfPages);
String fileName = sourceFile.getName().replace(".pdf", "");
String fileExtension = "png";
/*
* 600 dpi give good image clarity but size of each image is 2x times of 300 dpi.
* Ex: 1. For 300dpi 04-Request-Headers_2.png expected size is 797 KB
* 2. For 600dpi 04-Request-Headers_2.png expected size is 2.42 MB
*/
int dpi = 300;// use less dpi for to save more space in harddisk. For professional usage you can use more than 300dpi
for (int i = 0; i < numberOfPages; ++i) {
File outPutFile = new File(desFilePath + fileName +"_"+ (i+1) +"."+ fileExtension);
BufferedImage bImage = pdfRenderer.renderImageWithDPI(i, dpi, ImageType.RGB);
ImageIO.write(bImage, fileExtension, outPutFile);
urlList.add(outPutFile.getPath().replaceAll("\\\\", "/"));
}
document.close();
log.info("Converted Images are saved at ->{} ", destinationFile.getAbsolutePath());
} else {
log.error(sourceFile.getName() +" File not exists");
}
} catch (Exception e) {
e.printStackTrace();
}
return urlList;
}
public static void main(String[] args) {
convertPDFPagesToImages("D:\\tmp\\report\\pdfPath\\61199020100754118.pdf", "D:\\tmp\\report\\pdfPath\\");
}
}
但是我發現當pdf頁數比較多的時候,圖片轉換比較慢。 我考慮使用多線程來解析圖像。 是否可以通過多個線程將pdf轉換為圖片或有類似的方法?
加速這種轉換的一種簡單方法是將圖像寫入拆分到后台線程。 在打開PDF之前設置一個executorService:
ExecutorService exec = Executors.newFixedThreadPool(1);
無需在同一個調用線程中寫入圖像,只需向服務提交一個新任務:
// ImageIO.write(bImage, fileExtension, outPutFile);
exec.submit(() -> write(bImage, fileExtension, outPutFile));
和 function 執行任務:
private static void write(BufferedImage image, String fileExtension, File file) {
try {
ImageIO.write(image, fileExtension, file);
} catch (IOException e) {
throw new UncheckedIOException(e);
}
}
關閉 PDF 文檔后確保執行器完成:
exec.shutdown();
exec.awaitTermination(365, TimeUnit.DAYS);
對ImageIO.write
使用多個線程可能不會使您受益,因為它是繁重的 IO 操作,但正如我在評論中所說,嘗試寫入一個大的ByteArrayOutputStream
然后該文件也可能對您的特定硬件有所幫助。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.