[英]How to convert one pdf to multiple png images with multithreading
我用下面的方法将一张pdf转换成多张png图片:
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.rendering.ImageType;
import org.apache.pdfbox.rendering.PDFRenderer;
import org.imgscalr.Scalr;
public class ImgUtil {
public static List<String> convertPDFPagesToImages(String sourceFilePath, String desFilePath){
List<String> urlList = new ArrayList<>();
try {
File sourceFile = new File(sourceFilePath);
File destinationFile = new File(desFilePath);
if (!destinationFile.exists()) {
destinationFile.mkdir();
log.info("Folder Created ->:{} ", destinationFile.getAbsolutePath());
}
if (sourceFile.exists()) {
log.info("Images copied to Folder Location: ", destinationFile.getAbsolutePath());
PDDocument document = PDDocument.load(sourceFile);
PDFRenderer pdfRenderer = new PDFRenderer(document);
int numberOfPages = document.getNumberOfPages();
log.info("Total files to be converting ->{} ", numberOfPages);
String fileName = sourceFile.getName().replace(".pdf", "");
String fileExtension = "png";
/*
* 600 dpi give good image clarity but size of each image is 2x times of 300 dpi.
* Ex: 1. For 300dpi 04-Request-Headers_2.png expected size is 797 KB
* 2. For 600dpi 04-Request-Headers_2.png expected size is 2.42 MB
*/
int dpi = 300;// use less dpi for to save more space in harddisk. For professional usage you can use more than 300dpi
for (int i = 0; i < numberOfPages; ++i) {
File outPutFile = new File(desFilePath + fileName +"_"+ (i+1) +"."+ fileExtension);
BufferedImage bImage = pdfRenderer.renderImageWithDPI(i, dpi, ImageType.RGB);
ImageIO.write(bImage, fileExtension, outPutFile);
urlList.add(outPutFile.getPath().replaceAll("\\\\", "/"));
}
document.close();
log.info("Converted Images are saved at ->{} ", destinationFile.getAbsolutePath());
} else {
log.error(sourceFile.getName() +" File not exists");
}
} catch (Exception e) {
e.printStackTrace();
}
return urlList;
}
public static void main(String[] args) {
convertPDFPagesToImages("D:\\tmp\\report\\pdfPath\\61199020100754118.pdf", "D:\\tmp\\report\\pdfPath\\");
}
}
但是我发现当pdf页数比较多的时候,图片转换比较慢。 我考虑使用多线程来解析图像。 是否可以通过多个线程将pdf转换为图片或有类似的方法?
加速这种转换的一种简单方法是将图像写入拆分到后台线程。 在打开PDF之前设置一个executorService:
ExecutorService exec = Executors.newFixedThreadPool(1);
无需在同一个调用线程中写入图像,只需向服务提交一个新任务:
// ImageIO.write(bImage, fileExtension, outPutFile);
exec.submit(() -> write(bImage, fileExtension, outPutFile));
和 function 执行任务:
private static void write(BufferedImage image, String fileExtension, File file) {
try {
ImageIO.write(image, fileExtension, file);
} catch (IOException e) {
throw new UncheckedIOException(e);
}
}
关闭 PDF 文档后确保执行器完成:
exec.shutdown();
exec.awaitTermination(365, TimeUnit.DAYS);
对ImageIO.write
使用多个线程可能不会使您受益,因为它是繁重的 IO 操作,但正如我在评论中所说,尝试写入一个大的ByteArrayOutputStream
然后该文件也可能对您的特定硬件有所帮助。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.