iText页面颜色或黑白

Question

我正在尝试使用iText（或者您可能知道的其他一些Java库）来查找PDF文档的页面是否包含任何黑白对象（页面是黑白或彩色）。 我的PDF文件不应包含图像，因此我们不必为此担心。

有任何想法吗？

我希望除了转换为图像并读取每个像素的颜色外，还有其他方法。

Answer 1

一种可能的解决方案是获取页面流并为颜色设置运算符进行正则表达式搜索。

byte[] contentStream = pdfRdr.getPageContent(pageNo);

PDF页面上的几乎所有内容都是文本或图形对象。 使用最多四个浮点值后指定的运算符来设置颜色：

f1 .. fn SC % you need to know more about the colour space to determine whether this is black or not
fq .. fn sc
f1 f2 f3 RG % 0 0 0 would be black 1 1 1 would be white
f1 f2 f3 rg
f1 f2 f3 f4 K % CMYK (0 0 0 1 = Black, 0 0 0 0 = White, I think)
f1 f2 f3 f4 k
f1 g % the g operator choose the greyscale colour space
g1 G

我可以想象这可能很难解决。 更为实用的解决方案可能是将页面转换为图像（使用您可以用Google搜索的许多工具之一）然后检查图像。

Answer 2

Apache PDFBox可能的解决方案是创建图像并检查RGB像素。 但是请注意，即使PDF是纯黑白的，渲染的图像也可能包含灰度。

import java.awt.image.BufferedImage;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.common.PDRectangle;

...

public void checkColor(final File pdffile) {
  PDDocument document = PDDocument.load(pdffile);
  List<PDPage> pages = document.getDocumentCatalog().getAllPages();
  for (int i = 0; i < pages.size(); i++) {
    PDPage page = pages.get(i);
    BufferedImage image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 72);
    for (int h = 0; h < image.getHeight(); h++) {
      for (int w = 0; w < image.getWidth(); w++) {
        int pixel = image.getRGB(w, h);
        boolean color = isColorPixel(pixel);
        // ... do something
      }
    }
  }
}

private boolean isColorPixel(final int pixel) {
    int alpha = (pixel >> 24) & 0xff;
    int red = (pixel >> 16) & 0xff;
    int green = (pixel >> 8) & 0xff;
    int blue = (pixel) & 0xff;
    // gray: R = G = B
    return !(red == green && green == blue);
}

iText页面颜色或黑白

问题描述

2 个解决方案

解决方案1
1 已采纳 2011-11-23 22:37:24

解决方案2
0 2011-12-08 14:09:33

iText页面颜色或黑白

问题描述

2 个解决方案

解决方案1 1 已采纳 2011-11-23 22:37:24

解决方案2 0 2011-12-08 14:09:33

解决方案1
1 已采纳 2011-11-23 22:37:24

解决方案2
0 2011-12-08 14:09:33