如何使用C＃抓取PDF图像元素

Question

I wanted to know how to grab an image element in a PDF file using c#. 我想知道如何使用c＃捕获PDF文件中的图像元素。 I know how to add the elements to a PDF file I just need to know how to access the image elements. 我知道如何将元素添加到PDF文件中，我只需要知道如何访问图像元素。

I'm also using iTextSharp. 我也在使用iTextSharp。

Answer 1

I believe you can do this with itextsharp. 我相信您可以使用itextsharp做到这一点。

I had this code stored on my machine, but never used it. 我将此代码存储在计算机上，但从未使用过。 I got it off a forum and it's not tested, but I'm sure you could make it work. 我从一个论坛上获得它，并且尚未经过测试，但是我敢肯定您可以使其正常运行。

using iTextSharp.text;
using iTextSharp.text.pdf;

#region ExtractImagesFromPDF
        public static void ExtractImagesFromPDF(string sourcePdf, string outputPath)
        {
            // NOTE:  This will only get the first image it finds per page.
            PdfReader pdf = new PdfReader(sourcePdf);
            RandomAccessFileOrArray raf = new iTextSharp.text.pdf.RandomAccessFileOrArray(sourcePdf);

            try
            {
                for (int pageNumber = 1; pageNumber <= pdf.NumberOfPages; pageNumber++)
                {
                    PdfDictionary pg = pdf.GetPageN(pageNumber);
                    PdfDictionary res =
                      (PdfDictionary)PdfReader.GetPdfObject(pg.Get(PdfName.RESOURCES));
                    PdfDictionary xobj =
                      (PdfDictionary)PdfReader.GetPdfObject(res.Get(PdfName.XOBJECT));
                    if (xobj != null)
                    {
                        foreach (PdfName name in xobj.Keys)
                        {
                            PdfObject obj = xobj.Get(name);
                            if (obj.IsIndirect())
                            {
                                PdfDictionary tg = (PdfDictionary)PdfReader.GetPdfObject(obj);
                                PdfName type =
                                  (PdfName)PdfReader.GetPdfObject(tg.Get(PdfName.SUBTYPE));
                                if (PdfName.IMAGE.Equals(type))
                                {

                                    int XrefIndex = Convert.ToInt32(((PRIndirectReference)obj).Number.ToString(System.Globalization.CultureInfo.InvariantCulture));
                                    PdfObject pdfObj = pdf.GetPdfObject(XrefIndex);
                                    PdfStream pdfStrem = (PdfStream)pdfObj;
                                    byte[] bytes = PdfReader.GetStreamBytesRaw((PRStream)pdfStrem);
                                    if ((bytes != null))
                                    {
                                        using (System.IO.MemoryStream memStream = new System.IO.MemoryStream(bytes))
                                        {
                                            memStream.Position = 0;
                                            System.Drawing.Image img = System.Drawing.Image.FromStream(memStream);
                                            // must save the file while stream is open.
                                            if (!Directory.Exists(outputPath))
                                                Directory.CreateDirectory(outputPath);

                                            string path = Path.Combine(outputPath, String.Format(@"{0}.jpg", pageNumber));
                                            System.Drawing.Imaging.EncoderParameters parms = new System.Drawing.Imaging.EncoderParameters(1);
                                            parms.Param[0] = new System.Drawing.Imaging.EncoderParameter(System.Drawing.Imaging.Encoder.Compression, 0);
// GetImageEncoder is found below this method
                                            System.Drawing.Imaging.ImageCodecInfo jpegEncoder = GetImageEncoder("JPEG");
                                            img.Save(path, jpegEncoder, parms);
                                            break;

                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }

            catch
            {
                throw;
            }
            finally
            {
                pdf.Close();
            }


        }
        #endregion

       #region GetImageEncoder
        public static System.Drawing.Imaging.ImageCodecInfo GetImageEncoder(string imageType)
        {
            imageType = imageType.ToUpperInvariant();



            foreach (ImageCodecInfo info in ImageCodecInfo.GetImageEncoders())
            {
                if (info.FormatDescription == imageType)
                {
                    return info;
                }
            }

            return null;
        }
        #endregion

Answer 2

Bear in mind the image may not exist as an image but as a set of blobs (ie raw data, colorspace data, ICC profile or colorspace) which you will need to put together. 请记住，图像可能不以图像的形式存在，而是作为一组斑点（即原始数据，色彩空间数据，ICC配置文件或色彩空间）存在，您需要将它们放在一起。 The raw image may also be manipulated in the display (ie scaled, rotated, inverted, masked, clipped). 原始图像也可以在显示器中进行操作（即缩放，旋转，倒置，蒙版，剪切）。

如何使用C＃抓取PDF图像元素

问题描述

2 个解决方案

解决方案1
2 已采纳 2009-11-25 19:57:04

解决方案2
0 2009-11-26 08:37:32

如何使用C＃抓取PDF图像元素

问题描述

2 个解决方案

解决方案1 2 已采纳 2009-11-25 19:57:04

解决方案2 0 2009-11-26 08:37:32

解决方案1
2 已采纳 2009-11-25 19:57:04

解决方案2
0 2009-11-26 08:37:32