简体繁体 English

PDF小丑图像提取图像倒置

[英]PDFClown image extraction images inverted

原文 2015-05-29 14:24:12 5 1 pdf/ pdfclown

I'm working with PDFClown and I'm trying to extract images from a pdf file. 我正在使用PDFClown，正在尝试从pdf文件提取图像。 I use the example code provided by the source code that can be found at http://pdfclown.org . 我使用源代码提供的示例代码，该代码可在http://pdfclown.org上找到。

ImageExtractionSample.java. ImageExtractionSample.java。

The problem is the images are negative and flipped horizontally. 问题在于图像是负片并且水平翻转。 Does anyone know how to resolve this problem? 有谁知道如何解决这个问题？

1 个解决方案

Check with other PDF files to see if other PDF files are also giving the rotated or flipped images. 检查其他PDF文件，查看其他PDF文件是否也提供旋转或翻转的图像。 ImageExtractionSample.java is not checking rotation or matrix defined transformations for the image object but just writes the content to a file as is (so it will work for JPG images but not for CCIT encoded images for example). ImageExtractionSample.java不会检查图像对象的旋转或矩阵定义的转换，而只是将内容原样写入文件中（因此它将适用于JPG图像，但不适用于CCIT编码的图像）。

So there are things to consider when you extract image from PDF: 因此，当您从PDF提取图像时，需要考虑以下事项：

image can be rotated using the attached transformation matrix (CTM); 可以使用附加的变换矩阵（CTM）旋转图像；
image can be rotated/transformed as part of the form which is transformed; 图像可以作为变换形式的一部分进行旋转/变换；
image can be placed without transformation on a page but the page itself is rotated; 图像可以不经过任何变形而放置在页面上，但是页面本身可以旋转；
image may contain the overlaid Mask on top of it (and the Mask can be rotated and transformed); 图像可能在其顶部包含覆盖的蒙版（并且可以旋转和变形蒙版）；
JPG image is stored pretty much as is but there are other formats supported by PDF like CCIT compression, LZW compressed images etc; JPG图像几乎可以存储，但是PDF还支持其他格式，例如CCIT压缩，LZW压缩图像等。

But the general suggestion is that when you extract JPG image from PDF using PDFClown you should just flip and rotate extracted images like suggested on the SourceForge project discussion page . 但是一般的建议是，当您使用PDFClown从PDF提取JPG图像时，应该像SourceForge项目讨论页面上建议的那样翻转和旋转提取的图像。

if you could point to the particular PDF sample file then it would be easier to suggest the solution. 如果您可以指向特定的PDF示例文件，则建议解决方案会更容易。

If you're on Windows then you may use this free PDF Multitool utility to compare non-transformed and transformed images from PDF using "Extract raw images (without transformation)" option in images extraction dialog. 如果您使用的是Windows，则可以使用此免费的PDF Multitool实用程序，使用“图像提取”对话框中的“提取原始图像（不进行转换）”选项比较PDF中未转换和转换的图像。

Disclaimer: I work for ByteScout, the PDF Multitool utility is free for both commercial and non-commercial purposes. 免责声明：我为ByteScout工作，PDF Multitool实用程序对于商业和非商业目的都是免费的。