I'm trying to find out if the page of a PDF document contains any none black&white objects (page is b/w or color) using iText (or maybe some other java libs if you know any). My PDF files should not contain images, so we don't have to worry about that.
Any ideas?
I hope there's some other way than converting to an image and reading the color of every pixel.
One possible solution is to get the page stream and do a regex search for colour setting operators.
byte[] contentStream = pdfRdr.getPageContent(pageNo);
Nearly all content on a PDF page is text or a graphics object. The colour is set using operators specified after up to four floating point values:
f1 .. fn SC % you need to know more about the colour space to determine whether this is black or not
fq .. fn sc
f1 f2 f3 RG % 0 0 0 would be black 1 1 1 would be white
f1 f2 f3 rg
f1 f2 f3 f4 K % CMYK (0 0 0 1 = Black, 0 0 0 0 = White, I think)
f1 f2 f3 f4 k
f1 g % the g operator choose the greyscale colour space
g1 G
I can imagine this could be tricky to get right . A more pragmatic solution might be to convert the page to image (using one of many tools that you can google for) and then inspect the image.
A possible solution with Apache PDFBox is to create an image and check the pixels RGB. But be carefull, the rendered image may contain grayscale even if the PDF is pure b/w.
import java.awt.image.BufferedImage;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.common.PDRectangle;
...
public void checkColor(final File pdffile) {
PDDocument document = PDDocument.load(pdffile);
List<PDPage> pages = document.getDocumentCatalog().getAllPages();
for (int i = 0; i < pages.size(); i++) {
PDPage page = pages.get(i);
BufferedImage image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 72);
for (int h = 0; h < image.getHeight(); h++) {
for (int w = 0; w < image.getWidth(); w++) {
int pixel = image.getRGB(w, h);
boolean color = isColorPixel(pixel);
// ... do something
}
}
}
}
private boolean isColorPixel(final int pixel) {
int alpha = (pixel >> 24) & 0xff;
int red = (pixel >> 16) & 0xff;
int green = (pixel >> 8) & 0xff;
int blue = (pixel) & 0xff;
// gray: R = G = B
return !(red == green && green == blue);
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.