简体   繁体   English

由PdfSharp C#Library提取的破碎图像

[英]Broken images extracted by PdfSharp C# Library

I am trying to execute the sample program of PDFSharp library. 我正在尝试执行PDFSharp库的示例程序 I have already provided the references in the project. 我已经在项目中提供了参考资料。

PROBLEM: The code executes without any error and I see that the method ExportJpegImage() is called 7 times (which is the number of images in the pdf). 问题:代码执行时没有任何错误,我看到方法ExportJpegImage()被调用7次(这是pdf中的图像数)。 But when I try to open the images written by the program ( in .jpeg ), the Windows cannot open them. 但是当我尝试打开程序写入的图像(在.jpeg )时,Windows无法打开它们。

static void Main(string[] args)
    {
        Console.WriteLine("Starting PDF Sharp sample program...");

        const string filename = @"D:/Test/test.pdf";

        PdfDocument document = PdfReader.Open(filename);

        int imageCount = 0;
        // Iterate pages
        foreach (PdfPage page in document.Pages)
        {
            // Get resources dictionary
            PdfDictionary resources = page.Elements.GetDictionary("/Resources");
            if (resources != null)
            {
                // Get external objects dictionary
                PdfDictionary xObjects = resources.Elements.GetDictionary("/XObject");
                if (xObjects != null)
                {
                    ICollection<pdfitem> items = xObjects.Elements.Values;
                    // Iterate references to external objects
                    foreach (PdfItem item in items)
                    {
                        PdfReference reference = item as PdfReference;
                        if (reference != null)
                        {
                            PdfDictionary xObject = reference.Value as PdfDictionary;
                            // Is external object an image?
                            if (xObject != null && xObject.Elements.GetString("/Subtype") == "/Image")
                            {
                                ExportJpegImage(xObject, ref imageCount);
                            }
                        }
                    }
                }
            }
        }

    }

 static void ExportJpegImage(PdfDictionary image, ref int count)
    {
        // Fortunately JPEG has native support in PDF and exporting an image is just writing the stream to a file.
        byte[] stream = image.Stream.Value;
        FileStream fs = new FileStream(String.Format(@"D:\Test\Image{0}.jpeg", count++), FileMode.Create, FileAccess.Write);
        BinaryWriter bw = new BinaryWriter(fs);
        bw.Write(stream);
        bw.Close();
    }

Please read the note preceding that example : 请阅读该示例前面的注释:

Note: This snippet shows how to export JPEG images from a PDF file. 注意:此代码段显示如何从PDF文件导出JPEG图像 PDFsharp cannot convert PDF pages to JPEG files. PDFsharp无法将PDF页面转换为JPEG文件。 This sample does not handle non-JPEG images . 此示例不处理非JPEG图像 It does not (yet) handle JPEG images that have been flate-encoded. 它(尚未)处理已经过flate编码的JPEG图像。

There are several different formats for non-JPEG images in PDF. PDF中的非JPEG图像几种不同的格式 Those are not supported by this simple sample and require several hours of coding, but this is left as an exercise to the reader. 这些简单的样本不支持这些,并且需要几个小时的编码,但这仍然是读者的练习。

Thus, the images you want to extract from your PDF most likely simply are not embedded as JPEG images (at least not without further filtering like deflation). 因此,您想要从PDF中提取的图像很可能不是作为JPEG图像嵌入的(至少没有像通货紧缩那样进一步过滤)。 To transform that data into something a normal image viewer can handle, you most likely have to invest several hours of coding . 要将数据转换为普通图像查看器可以处理的内容,您很可能需要花费数小时的编码时间

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM