简体   繁体   English

使用PDFBox v2从PDF中提取TIFF

[英]Extract TIFF from PDF with PDFBox v2

It was possible to extract images including TIFF format using PDFBox v1.x. 使用PDFBox v1.x可以提取包括TIFF格式的图像。 It is still possible to extract some formats with version 2.0.1 but it seems not to be working for TIFF format. 仍然可以使用2.0.1版提取某些格式,但似乎不适用于TIFF格式。

for (COSName c : page.getResources().getXObjectNames()) {
    PDXObject o = resources.getXObject(c);
    if (o instanceof PDImageXObject) {
        PDImageXObject image = (PDImageXObject) o;
        try (ByteArrayOutputStream bout = new ByteArrayOutputStream()) {
            ImageIO.write(image.getImage(), image.getSuffix(), bout);
            System.out.println("Image Bytes: " + bout.size());
        }
    }
}

ByteArrayOutputStream in above code has bytes for PNG and JPEG but nothing for TIFF. 上面代码中的ByteArrayOutputStream具有用于PNG和JPEG的字节,但是对于TIFF没有任何字节。 Any suggestions? 有什么建议么?

Thanks. 谢谢。

To extract PDF to the TIFF format, you need to include the jai_imageio library in the classpath. 要将PDF提取为TIFF格式,需要在类路径中包含jai_imageio库。

I used the distribution from the geotoolkit repository, here is my pom.xml extract: 我使用了来自geotoolkit存储库的发行版,这是我的pom.xml摘录:

<repositories>
    <repository>
        <id>Geotoolkit</id>
        <name>Geotoolkit</name>
        <url>http://maven.geotoolkit.org/</url>
    </repository>
</repositories>

<dependencies>
    <dependency>
        <groupId>org.apache.pdfbox</groupId>
        <artifactId>pdfbox</artifactId>
        <version>2.0.3</version>
    </dependency>

    <!--required by PDFBox to create tiff images-->
    <dependency>
        <groupId>javax.media</groupId>
        <artifactId>jai_imageio</artifactId>
        <version>1.1.1</version>
    </dependency>
</dependencies>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM