简体   繁体   中英

Determining Best compression algorithm for given PDF file

I'm currently using Docotic PDF library to write a compression program for a PDF file server hosting large scanned documents. (Intention is to get the smallest size in black and white that maintains a readable document- mostly legal briefs)

In testing I notice that certain files will respond better to JPEG compression while others respond better to Group3Fax or Flate. Is it possible to analyze the file and make an intelligent decision on which algorithm will produce the smallest PDF or would I actually have compress each file with all three algorithms and choose the smallest - which is incurs a ton of additional CPU overhead.

Any guidance is greatly appreciated. Thanks

If the image in the PDF is monochrome I'd suggest using JBIG2 compression (if available from your PDF software), it typically exceeds Group compression. Though be careful if you are using lossy JBIG2 (see my company's blog for details on what could go wrong where text can change).

Group 3 compression (though I'd suggest using Group 4, if available, should provide better results) is only applicable for monochrome (1bpc) images, JPEG is for color or grayscale images (though not all PDF software supports it for grayscale). Flate is compatible with monochrome, grayscale, or color images.

Since they are scanned images JPEG should typically exceed Flate compression so I would say you don't need to compress with both for comparison. JPEG2000 (if available from your PDF software) would exceed JPEG in most cases. Similarly, Group compression should exceed Flate in most cases.

The overhead shouldn't be that great to compress them unless perhaps the images themselves are huge or the compression algorithm is suboptimal.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM