简体繁体 English

从PDF提取TIFF图像而无需解码

[英]Extract TIFF images from PDF without decoding

原文 2018-11-06 15:10:23 6 1 java/ image/ pdf/ itext/ tiff

With the help of iText 5 I would like to extract all TIFF images from given PDF file and save them as TIFF files. 借助iText 5，我想从给定的PDF文件中提取所有TIFF图像，并将它们另存为TIFF文件。 Examples and other posts ( 1 , 2 ) use the following method: 实施例和其他职位（ 1 ， 2 ）使用下面的方法：

Create PdfImageObject from PDF stream which in line 189 decodes the image stream (if corresponding filter implementation is present). 从PDF流创建PdfImageObject ，在第189行中解码图像流（如果存在相应的过滤器实现）。
Call PdfImageObject#getImageAsBytes() which returns JPEG (original), PNG (re-encoded) or TIFF (in case of 8 bits per pixel). 调用PdfImageObject#getImageAsBytes() ，该PdfImageObject#getImageAsBytes()返回JPEG（原始），PNG（重新编码）或TIFF（每个像素8位）。

As a result TIFF image with 1 bit color depth is converted to PNG, which is not what I need. 结果，具有1位色深的TIFF图像被转换为PNG，这不是我所需要的。

Another approach would be to call PdfImageObject#getBufferedImage() which will decode the image in step (2) into raster and afterwards encode it again as TIFF using ImageIO.write(bufferedImage, "tiff", file) . 另一种方法是调用PdfImageObject#getBufferedImage() ，它将在步骤（2）中将图像解码为栅格，然后使用ImageIO.write(bufferedImage, "tiff", file)再次将其编码为TIFF。

As one can see this is not efficient. 可以看出这是无效的。 Another solution shown in this post demonstrates how to save encoded TIFF image stream to file by prepending it a TIFF header – that is the solution I am looking for. 这篇文章中显示的另一个解决方案演示了如何通过在TIFF标头之前添加TIFF图像流来将编码的TIFF图像流保存到文件中-这就是我正在寻找的解决方案。

Can iText help here? iText可以在这里提供帮助吗？

1 个解决方案

PDF images are not TIFF images. PDF图像不是 TIFF图像。

PDFs however can contain images that use compression techniques that are also used in TIFF, eg Flate, CCITT, LZW, JPEG. 但是，PDF可以包含使用TIFF中也使用的压缩技术的图像，例如Flate，CCITT，LZW，JPEG。