使用PDFBox v2从PDF中提取TIFF

Question

It was possible to extract images including TIFF format using PDFBox v1.x. 使用PDFBox v1.x可以提取包括TIFF格式的图像。 It is still possible to extract some formats with version 2.0.1 but it seems not to be working for TIFF format. 仍然可以使用2.0.1版提取某些格式，但似乎不适用于TIFF格式。

for (COSName c : page.getResources().getXObjectNames()) {
    PDXObject o = resources.getXObject(c);
    if (o instanceof PDImageXObject) {
        PDImageXObject image = (PDImageXObject) o;
        try (ByteArrayOutputStream bout = new ByteArrayOutputStream()) {
            ImageIO.write(image.getImage(), image.getSuffix(), bout);
            System.out.println("Image Bytes: " + bout.size());
        }
    }
}

ByteArrayOutputStream in above code has bytes for PNG and JPEG but nothing for TIFF. 上面代码中的ByteArrayOutputStream具有用于PNG和JPEG的字节，但是对于TIFF没有任何字节。 Any suggestions? 有什么建议么？

Thanks. 谢谢。

Answer 1

To extract PDF to the TIFF format, you need to include the jai_imageio library in the classpath. 要将PDF提取为TIFF格式，需要在类路径中包含jai_imageio库。

I used the distribution from the geotoolkit repository, here is my pom.xml extract: 我使用了来自geotoolkit存储库的发行版，这是我的pom.xml摘录：

<repositories>
    <repository>
        <id>Geotoolkit</id>
        <name>Geotoolkit</name>
        <url>http://maven.geotoolkit.org/</url>
    </repository>
</repositories>

<dependencies>
    <dependency>
        <groupId>org.apache.pdfbox</groupId>
        <artifactId>pdfbox</artifactId>
        <version>2.0.3</version>
    </dependency>

    <!--required by PDFBox to create tiff images-->
    <dependency>
        <groupId>javax.media</groupId>
        <artifactId>jai_imageio</artifactId>
        <version>1.1.1</version>
    </dependency>
</dependencies>

使用PDFBox v2从PDF中提取TIFF

问题描述

1 个解决方案

解决方案1
1 2016-10-24 16:38:04

使用PDFBox v2从PDF中提取TIFF

问题描述

1 个解决方案

解决方案1 1 2016-10-24 16:38:04

解决方案1
1 2016-10-24 16:38:04