JAI：如何从多页TIFF图像容器中提取单页输入流？

Question

I have a component that converts PDF documents to images, one image per page . 我有一个将PDF文档转换为图像的组件， 每页一个图像 。 Since the component uses converters producing in-memory images, it hits the JVM heap heavily and takes some time to finish conversions. 由于该组件使用生成内存映像的转换器，因此它严重打击了JVM堆，并需要一些时间来完成转换。

I'm trying to improve the overall performance of the conversion process, and found a native library with a JNI binding to convert PDFs to TIFFs. 我试图改善转换过程的整体性能，并发现了一个具有JNI绑定的本机库，可将PDF转换为TIFF。 That library can convert PDFs to single TIFF files only (requires intermediate file system storage; does not even consume conversion streams), therefore result TIFF files have converted pages embedded, and not per-page images on the file system. 该库只能将PDF转换为单个TIFF文件（需要中间文件系统存储；甚至不使用转换流），因此，结果TIFF文件已嵌入已转换的页面，而不是文件系统上的每页图像。 Having a native library improves the overall conversion drastically and the performance gets really faster, but there is a real bottleneck: since I have to make a source-page to destination-page conversion, now I must extract every page from the result file and write all of them elsewhere. 拥有本机库可以显着提高整体转换速度，并且性能得到提高，但是确实存在瓶颈：由于我必须进行从源页面到目标页面的转换，所以现在我必须从结果文件中提取每个页面并编写他们都在别处。 A simple and naive approach with RenderedImage s: 使用RenderedImage的一种简单而幼稚的方法：

final SeekableStream seekableStream = new FileSeekableStream(tempFile);
final ImageDecoder imageDecoder = createImageDecoder("tiff", seekableStream, null);
...
//                                               V--- heap is wasted here
final RenderedImage renderedImage = imageDecoder.decodeAsRenderedImage(pageNumber);
// ... do the rest stuff ...

Actually speaking, I would really like just to extract a concrete page input stream from the TIFF container file ( tempFile ) and just redirect it to elsewhere without having it to be stored as an in-memory image. 实际上，我真的很想从TIFF容器文件（ tempFile ）中提取一个具体的页面输入流，然后将其重定向到其他位置，而不必将其存储为内存图像。 I would imagine an approach similar to containers processing where I need to seek for a specific entry to extract data from it (say, something like ZIP files processing, etc). 我会想象一种类似于容器处理的方法，在这种方法中，我需要寻找一个特定的条目来从中提取数据（例如，诸如ZIP文件处理之类的东西）。 But I couldn't find anything like that in ImageDecoder , or I'm probably wrong with my expectations and just missing something important here... 但是我在ImageDecoder找不到类似的ImageDecoder ，或者我的期望可能错了，只是在这里缺少了一些重要的东西...

Is it possible to extract TIFF container page input streams using JAI API or probably third-party alternatives? 是否可以使用JAI API或第三方替代品来提取TIFF容器页面输入流？ Thanks in advance. 提前致谢。

Answer 1

I could be wrong, but don't think JAI has support for splitting TIFFs without decoding the files to in-memory images. 我可能是错的，但不要认为JAI支持在不将文件解码为内存图像的情况下拆分TIFF。 And, sorry for promoting my own library, but I think it does exactly what you need (the main part of the solution used to split TIFFs is contributed by a third party). 而且，很抱歉推广我自己的库，但是我认为它完全可以满足您的需要（用于拆分TIFF的解决方案的主要部分由第三方提供）。

By using the TIFFUtilities class from com.twelvemonkeys.contrib.tiff , you should be able to split your multi-page TIFF to multiple single-page TIFFs like this: 通过使用TIFFUtilities从类com.twelvemonkeys.contrib.tiff ，你应该能够在您的多页TIFF分割到多个单页TIFF这样的：

TIFFUtilities.split(tempFile, new File("output"));

No decoding of the images are done, only splitting each IFD into a separate file, and writing the streams with corrected offsets and byte counts. 不对图像进行解码，仅将每个IFD拆分为一个单独的文件，并使用已校正的偏移量和字节数写入流。

Files will be named output/0001.tif , output/0002.tif etc. If you need more control over the output name or have other requirements, you can easily modify the code. 文件将被命名为output/0001.tif ， output/0002.tif等。如果您需要对输出名称的更多控制或有其他要求，则可以轻松地修改代码。 The code comes with a BSD-style license. 该代码带有BSD样式的许可证。

JAI：如何从多页TIFF图像容器中提取单页输入流？

问题描述

1 个解决方案

解决方案1
3 已采纳 2017-06-20 08:33:49

JAI：如何从多页TIFF图像容器中提取单页输入流？

问题描述

1 个解决方案

解决方案1 3 已采纳 2017-06-20 08:33:49

解决方案1
3 已采纳 2017-06-20 08:33:49