简体   繁体   中英

How to check PDF pages for resolution (DPI) of embedded images?

Is there any free library, that can be used to get resolution of images in DPI contained by PDF file?

I've tried the following code, using PDFSharp but the DPI it returns is not correct. For example it shows 96dpi while it should be 150dpi:

using (PdfDocument pdf = PdfReader.Open(sourcePdf))
{
    for (int i = 0; i < pdf.Pages.Count; i++)
    {
        XGraphics xGraphics = XGraphics.FromPdfPage(pdf.Pages[i]);
        float dpi = xGraphics.Graphics.DpiX; 
    }
}

You can use a command line tool to get the info you need: pdfimages .

However, you need a recent version pdfimages that is based on the Poppler library ( NOT the 'pdfimages' that is based on XPDF !)

Recent Poppler versions let you use the -list option:

pdfimages -list -f 2 -l 4 my.pdf

The output of above example command shows all images in the page range from 2 ( f irst page to show) to 4 ( l ast page to show).

Here is the output for the above command, using an example PDF file I prepared specifically for this question (scroll horizontally to see all columns):

page num  type width height color comp bpc  enc interp object ID x-ppi y-ppi size ratio
---------------------------------------------------------------------------------------
   2   0 image   697  1238  gray    1   8  jpeg   no       16  0   320   320  142K  17%
   3   1 image   697  1238  gray    1   8  jpeg   no       16  0   151   151  142K  17%
   4   2 image   697  1238  gray    1   8  jpeg   no       16  0    84   115  142K  17%

The output shows the following:

  1. There are three images on the three pages 2-4 (as indicated by columns 1+2, headed page and num ).

  2. The PDF object IDs for all three images are identical: 16 0 (as indicated by columns 11+12, headed object + ID ). This means the PDF has only one distinct object defined, but showing it three times (ie, the image is embedded only once, but appears on 3 pages).

  3. The image's width is 697 pixels, its height is 1238 pixels, its image depth (bits per color) is 8 , its colorspace is gray its number of color channels/components is 1 , its compression scheme is jpeg , its bytesize (as embedded) is 142K , its compression rate is 17% (as indicated by columns 4-9 and 14+15 headed width , height , color , comp , bpc , size and ratio ).

  4. However, the same image appears on different pages in different resolutions (given as PPI -- pixels per inch --- not DPI ):

    • page 2 shows it with a PPI of 320 in both directions,

    • page 4 shows it with a PPI of 151 in both directions,

    • while page 3 shows it with a PPI of 84 in horizontal (X) direction and 115 PPI in vertical (Y) direction.


Now, if a command line tool cannot be re-purposed for your goal: the Poppler library which is the base for the tool shown above certainly is Free ( 'free as in liberty' , as well as 'free as in beer' ).


Here is a link to the PDF ( "my.pdf" ) I used to demonstrate the output of the command above.

PDF's do not necessarily use DPI in their definitions. PDF's allow the document creator to define their own user coordinate space which may or may not map to anything similar to Dots Per Inch.

From here:

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM