繁体   English   中英

使用pdfbox计数pdf中的图像

[英]count images in pdf using pdfbox

我需要从pdf中提取文本以验证某些内容并使用java计算pdf文档中的图像数量。 我可以使用下面的getText函数毫无问题地获取文本内容,但是找不到仅对图像对象计数的方法。 我已经能够使用下面的代码获得所有对象的计数,但是找不到关于如何仅计数图像的文档。 任何想法将不胜感激。 谢谢

static String getText(File pdfFile) throws IOException {
    PDDocument doc = PDDocument.load(pdfFile);
    return new PDFTextStripper().getText(doc);
 }

static void countImages(File pdfFile) throws IOException{

   PDDocument doc = PDDocument.load(pdfFile);
   List myObjects = doc.getDocument().getObjects();
   System.out.println("Count: " + myObjects.size());
   doc.close();

 }

一个快速而肮脏的解决方案可能是这样的:

static void countImages(File pdfFile) throws IOException{
    PDDocument doc = PDDocument.load(pdfFile);
    PDResources res = doc.getDocumentCatalog().getPages().getResources();

    int numImg = 0;
    for (PDXObject xobject : res.getXObjects().values()) {
        if (xobject instanceof PDXObjectImage) {
            numImg++;
        }
    }
    System.out.println("Count: " + numImg);

    doc.close();
}

这应该可以解决问题:

public static void main(String[] args) throws IOException {
    PDDocument document = PDDocument.load(new File(""));

    int numImages = 0;
    for (int i = 0; i < document.getNumberOfPages(); i++)
    {
        PDPage page = document.getPage(i);

        CountImages countImages = new CountImages(page);
        countImages.processPage(page);

        numImages += countImages.numImages;
    }

    System.out.println(numImages);
}

static class CountImages extends PDFGraphicsStreamEngine {
    public int numImages = 0;
    private final Set<COSStream> duplicates = new HashSet<>();

    protected CountImages(PDPage page) throws IOException
    {
        super(page);
    }

    @Override
    public void appendRectangle(Point2D pd, Point2D pd1, Point2D pd2, Point2D pd3) throws IOException {
    }

    @Override
    public void drawImage(PDImage pdImage) throws IOException {
        if (pdImage instanceof PDImageXObject) {
            PDImageXObject xobject = (PDImageXObject)pdImage;

            if (duplicates.contains(xobject.getCOSObject()) == false) {
                numImages++;
                duplicates.add(xobject.getCOSObject());
            }
        } else {
            numImages++; //means its an inline image
        }
    }

    @Override
    public void clip(int i) throws IOException {
    }

    @Override
    public void moveTo(float f, float f1) throws IOException {
    }

    @Override
    public void lineTo(float f, float f1) throws IOException {
    }

    @Override
    public void curveTo(float f, float f1, float f2, float f3, float f4, float f5) throws IOException {
    }

    @Override
    public Point2D getCurrentPoint() throws IOException {
        return new Point2D.Float(0, 0);
    }

    @Override
    public void closePath() throws IOException {
    }

    @Override
    public void endPath() throws IOException {
    }

    @Override
    public void strokePath() throws IOException {
    }

    @Override
    public void fillPath(int i) throws IOException {
    }

    @Override
    public void fillAndStrokePath(int i) throws IOException {
    }

    @Override
    public void shadingFill(COSName cosn) throws IOException {
    }
}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM