简体   繁体   中英

Broken images extracted by PdfSharp C# Library

I am trying to execute the sample program of PDFSharp library. I have already provided the references in the project.

PROBLEM: The code executes without any error and I see that the method ExportJpegImage() is called 7 times (which is the number of images in the pdf). But when I try to open the images written by the program ( in .jpeg ), the Windows cannot open them.

static void Main(string[] args)
    {
        Console.WriteLine("Starting PDF Sharp sample program...");

        const string filename = @"D:/Test/test.pdf";

        PdfDocument document = PdfReader.Open(filename);

        int imageCount = 0;
        // Iterate pages
        foreach (PdfPage page in document.Pages)
        {
            // Get resources dictionary
            PdfDictionary resources = page.Elements.GetDictionary("/Resources");
            if (resources != null)
            {
                // Get external objects dictionary
                PdfDictionary xObjects = resources.Elements.GetDictionary("/XObject");
                if (xObjects != null)
                {
                    ICollection<pdfitem> items = xObjects.Elements.Values;
                    // Iterate references to external objects
                    foreach (PdfItem item in items)
                    {
                        PdfReference reference = item as PdfReference;
                        if (reference != null)
                        {
                            PdfDictionary xObject = reference.Value as PdfDictionary;
                            // Is external object an image?
                            if (xObject != null && xObject.Elements.GetString("/Subtype") == "/Image")
                            {
                                ExportJpegImage(xObject, ref imageCount);
                            }
                        }
                    }
                }
            }
        }

    }

 static void ExportJpegImage(PdfDictionary image, ref int count)
    {
        // Fortunately JPEG has native support in PDF and exporting an image is just writing the stream to a file.
        byte[] stream = image.Stream.Value;
        FileStream fs = new FileStream(String.Format(@"D:\Test\Image{0}.jpeg", count++), FileMode.Create, FileAccess.Write);
        BinaryWriter bw = new BinaryWriter(fs);
        bw.Write(stream);
        bw.Close();
    }

Please read the note preceding that example :

Note: This snippet shows how to export JPEG images from a PDF file. PDFsharp cannot convert PDF pages to JPEG files. This sample does not handle non-JPEG images . It does not (yet) handle JPEG images that have been flate-encoded.

There are several different formats for non-JPEG images in PDF. Those are not supported by this simple sample and require several hours of coding, but this is left as an exercise to the reader.

Thus, the images you want to extract from your PDF most likely simply are not embedded as JPEG images (at least not without further filtering like deflation). To transform that data into something a normal image viewer can handle, you most likely have to invest several hours of coding .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM