简体   繁体   English

从PDF提取TIFF图像而无需解码

[英]Extract TIFF images from PDF without decoding

With the help of iText 5 I would like to extract all TIFF images from given PDF file and save them as TIFF files. 借助iText 5,我想从给定的PDF文件中提取所有TIFF图像,并将它们另存为TIFF文件。 Examples and other posts ( 1 , 2 ) use the following method: 实施例和其他职位( 12 )使用下面的方法:

  1. Create PdfImageObject from PDF stream which in line 189 decodes the image stream (if corresponding filter implementation is present). 从PDF流创建PdfImageObject ,在第189行中解码图像流(如果存在相应的过滤器实现)。
  2. Call PdfImageObject#getImageAsBytes() which returns JPEG (original), PNG (re-encoded) or TIFF (in case of 8 bits per pixel). 调用PdfImageObject#getImageAsBytes() ,该PdfImageObject#getImageAsBytes()返回JPEG(原始),PNG(重新编码)或TIFF(每个像素8位)。

As a result TIFF image with 1 bit color depth is converted to PNG, which is not what I need. 结果,具有1位色深的TIFF图像被转换为​​PNG,这不是我所需要的。

Another approach would be to call PdfImageObject#getBufferedImage() which will decode the image in step (2) into raster and afterwards encode it again as TIFF using ImageIO.write(bufferedImage, "tiff", file) . 另一种方法是调用PdfImageObject#getBufferedImage() ,它将在步骤(2)中将图像解码为栅格,然后使用ImageIO.write(bufferedImage, "tiff", file)再次将其编码为TIFF。

As one can see this is not efficient. 可以看出这是无效的。 Another solution shown in this post demonstrates how to save encoded TIFF image stream to file by prepending it a TIFF header – that is the solution I am looking for. 这篇文章中显示的另一个解决方案演示了如何通过在TIFF标头之前添加TIFF图像流来将编码的TIFF图像流保存到文件中-这就是我正在寻找的解决方案。

Can iText help here? iText可以在这里提供帮助吗?

PDF images are not TIFF images. PDF图像不是 TIFF图像。

PDFs however can contain images that use compression techniques that are also used in TIFF, eg Flate, CCITT, LZW, JPEG. 但是,PDF可以包含使用TIFF中使用的压缩技术的图像,例如Flate,CCITT,LZW,JPEG。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM