简体   繁体   English

确定给定PDF文件的最佳压缩算法

[英]Determining Best compression algorithm for given PDF file

I'm currently using Docotic PDF library to write a compression program for a PDF file server hosting large scanned documents. 我目前正在使用Docotic PDF库为托管大型扫描文档的PDF文件服务器编写压缩程序。 (Intention is to get the smallest size in black and white that maintains a readable document- mostly legal briefs) (意图是获得最小的黑白尺寸,以保持可读的文件 - 主要是法律简报)

In testing I notice that certain files will respond better to JPEG compression while others respond better to Group3Fax or Flate. 在测试中,我注意到某些文件对JPEG压缩的响应更好,而其他文件对Group3Fax或Flate的响应更好。 Is it possible to analyze the file and make an intelligent decision on which algorithm will produce the smallest PDF or would I actually have compress each file with all three algorithms and choose the smallest - which is incurs a ton of additional CPU overhead. 是否有可能分析文件并对哪个算法生成最小的PDF做出明智的决定,或者我实际上是用三种算法压缩每个文件并选择最小的 - 这会产生大量额外的CPU开销。

Any guidance is greatly appreciated. 非常感谢任何指导。 Thanks 谢谢

If the image in the PDF is monochrome I'd suggest using JBIG2 compression (if available from your PDF software), it typically exceeds Group compression. 如果PDF中的图像是单色的,我建议使用JBIG2压缩(如果可以从PDF软件获得),它通常会超过组压缩。 Though be careful if you are using lossy JBIG2 (see my company's blog for details on what could go wrong where text can change). 如果您使用有损JBIG2,请小心(请参阅我公司的博客 ,了解文本可能发生变化时可能出现的问题)。

Group 3 compression (though I'd suggest using Group 4, if available, should provide better results) is only applicable for monochrome (1bpc) images, JPEG is for color or grayscale images (though not all PDF software supports it for grayscale). 第3组压缩(虽然我建议使用第4组,如果可用,应提供更好的结果)仅适用于单色(1bpc)图像,JPEG适用于彩色或灰度图像(尽管并非所有PDF软件都支持灰度)。 Flate is compatible with monochrome, grayscale, or color images. Flate与单色,灰度或彩色图像兼容。

Since they are scanned images JPEG should typically exceed Flate compression so I would say you don't need to compress with both for comparison. 由于它们是扫描图像,JPEG通常应该超过Flate压缩,所以我会说你不需要压缩它们进行比较。 JPEG2000 (if available from your PDF software) would exceed JPEG in most cases. 在大多数情况下,JPEG2000(如果可从PDF软件获得)将超过JPEG。 Similarly, Group compression should exceed Flate in most cases. 同样,在大多数情况下,组压缩应超过Flate。

The overhead shouldn't be that great to compress them unless perhaps the images themselves are huge or the compression algorithm is suboptimal. 除非图像本身很大或者压缩算法不是最理想的,否则压缩它们的开销不应该那么大。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM