简体   繁体   English

以pdf- IText检索图像的页码

[英]Retrieve the page number of an image in pdf- IText

I am using the code from the below link to render the images 我正在使用下面链接中的代码来渲染图像

MyImageRenderListener - IText MyImageRenderListener-IText

Below is my try block of the Code. 下面是我的代码尝试块。 What I am actually doing is finding the DPI of the image and if the dpi of the image is below 300 then writing it in a text file. 我实际上在做的是查找图像的DPI ,如果图像的dpi低于300,则将其写入文本文件。

NOW , I also want to write the page numbers where these images are located in the PDF. 现在 ,我还想写这些图像在PDF中的页码。 How can I obtain the Page Number of that image? 如何获得该图像的页码?

    try {
            String filename;
            FileOutputStream os;
            PdfImageObject image = renderInfo.getImage();
            BufferedImage img = null;
            String txtfile = "results/results.txt";
            PdfDictionary imageDict = renderInfo.getImage().getDictionary();
            float widthPx = imageDict.getAsNumber(PdfName.WIDTH).floatValue(); 
            float heightPx = imageDict.getAsNumber(PdfName.HEIGHT).floatValue();
            float widthUu = renderInfo.getImageCTM().get(Matrix.I11);
            float heigthUu = renderInfo.getImageCTM().get(Matrix.I22);
            float widthIn = widthUu/72;
            float heightIn = heigthUu/72;
            float imagepdi = widthPx/widthIn;
            filename = String.format(path, renderInfo.getRef().getNumber(), image.getFileType());
            System.out.println(filename+"-->"+imagepdi);
            if(imagepdi < 300){
                File file = new File("C:/Users/Abhinav/workspace/itext/results/result.txt");



                if(filename != null){
                    if (!file.exists()) {
                        file.createNewFile();
                    }

                    FileWriter fw = new FileWriter(file.getAbsoluteFile(),true);
                    file.setReadable(true, false);
                    file.setExecutable(true, false);
                    file.setWritable(true, false);
                    BufferedWriter bw = new BufferedWriter(fw);
                    bw.write(filename);
                    bw.write("\r\n");
                    bw.close();
                }
            }

This is a strange question, because it is incomplete and illogical. 这是一个奇怪的问题,因为它不完整且不合逻辑。

Why is your question incomplete? 为什么您的问题不完整?

You are using MyImageRenderListener in the context of another example, ExtractImages : 您正在使用MyImageRenderListener在另一个例子中,上下文ExtractImages

PdfReader reader = new PdfReader(filename);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
MyImageRenderListener listener = new MyImageRenderListener(RESULT);
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
    parser.processContent(i, listener);
}
reader.close();

In this example, you loop over every page number to examine every separate page . 在此示例中, 您遍历每个页码以检查每个单独的页 Hence you know the page number whenever MyImageRenderListener returns an image. 因此每当MyImageRenderListener返回图像时, 您就知道页码

Images are stored inside a PDF as external objects (aka XObject). 图像作为外部对象(也称为XObject)存储在PDF中。 MyImageRenderListener returns what's stored in such a stream object (containing the bytes of the image). MyImageRenderListener返回存储在此类流对象中的内容(包含图像的字节)。 So far, so good. 到现在为止还挺好。

Why is your question illogical? 为什么您的问题不合逻辑?

Because the whole purpose of storing images in XObject is to be able to reuse the same image stream. 因为在XObject中存储图像的全部目的是能够重用相同的图像流。 Imagine an image of a logo. 想象徽标的图像。 That image can be present on every page of the document. 该图像可以出现在文档的每一页上。 In this case, MyImageRenderListener will give you the same image (from the same stream) as many times as there are pages, but in reality, there is only one image, and it's external to the page content. 在这种情况下, MyImageRenderListener将为您提供相同图像(来自同一流)的次数是页面的多少倍,但是实际上,只有一个图像,并且它是页面内容的外部。 It doesn't make sense for that image to "know" the page it is on: it is on every page. 该图像“知道”它所在的页面是没有意义的:它在每个页面上。 The same logic applies even when the image is only used on one page. 即使仅在一页上使用图像,也适用相同的逻辑。 That is inherent to the design of PDF: an image stream doesn't know which page it belongs to. 这是PDF设计所固有的:图像流不知道它属于哪个页面。 The link between the image stream and the page exists through the /XObject entry in the /Resources of the page dictionary. 图像流和页面之间的链接通过页面字典的/Resources中的/XObject条目存在。

What would be an elegant way to solve this? 有什么优雅的方法可以解决这个问题?

Create a member-variable in MyImageRenderListener , eg: MyImageRenderListener创建一个成员变量,例如:

protected int pagenumber;

public void setPagenumber(int pagenumber) {
    this.pagenumber = pagenumber;
}

Use the setter from your loop: 从循环中使用setter:

PdfReader reader = new PdfReader(filename);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
MyImageRenderListener listener = new MyImageRenderListener(RESULT);
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
    listener.setPagenumber(i);
    parser.processContent(i, listener);
}
reader.close();

Now you can use pagenumber in the renderImage(ImageRenderInfo renderInfo) method. 现在,您可以在renderImage(ImageRenderInfo renderInfo)方法中使用pagenumber This way, you'll always know which page is being examined when this method is triggered. 这样,您将始终知道触发此方法时正在检查哪个页面。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM