[英]How to Grab PDF image elements using c#
I wanted to know how to grab an image element in a PDF file using c#. 我想知道如何使用c#捕获PDF文件中的图像元素。 I know how to add the elements to a PDF file I just need to know how to access the image elements. 我知道如何将元素添加到PDF文件中,我只需要知道如何访问图像元素。
I'm also using iTextSharp. 我也在使用iTextSharp。
I believe you can do this with itextsharp. 我相信您可以使用itextsharp做到这一点。
I had this code stored on my machine, but never used it. 我将此代码存储在计算机上,但从未使用过。 I got it off a forum and it's not tested, but I'm sure you could make it work. 我从一个论坛上获得它,并且尚未经过测试,但是我敢肯定您可以使其正常运行。
using iTextSharp.text;
using iTextSharp.text.pdf;
#region ExtractImagesFromPDF
public static void ExtractImagesFromPDF(string sourcePdf, string outputPath)
{
// NOTE: This will only get the first image it finds per page.
PdfReader pdf = new PdfReader(sourcePdf);
RandomAccessFileOrArray raf = new iTextSharp.text.pdf.RandomAccessFileOrArray(sourcePdf);
try
{
for (int pageNumber = 1; pageNumber <= pdf.NumberOfPages; pageNumber++)
{
PdfDictionary pg = pdf.GetPageN(pageNumber);
PdfDictionary res =
(PdfDictionary)PdfReader.GetPdfObject(pg.Get(PdfName.RESOURCES));
PdfDictionary xobj =
(PdfDictionary)PdfReader.GetPdfObject(res.Get(PdfName.XOBJECT));
if (xobj != null)
{
foreach (PdfName name in xobj.Keys)
{
PdfObject obj = xobj.Get(name);
if (obj.IsIndirect())
{
PdfDictionary tg = (PdfDictionary)PdfReader.GetPdfObject(obj);
PdfName type =
(PdfName)PdfReader.GetPdfObject(tg.Get(PdfName.SUBTYPE));
if (PdfName.IMAGE.Equals(type))
{
int XrefIndex = Convert.ToInt32(((PRIndirectReference)obj).Number.ToString(System.Globalization.CultureInfo.InvariantCulture));
PdfObject pdfObj = pdf.GetPdfObject(XrefIndex);
PdfStream pdfStrem = (PdfStream)pdfObj;
byte[] bytes = PdfReader.GetStreamBytesRaw((PRStream)pdfStrem);
if ((bytes != null))
{
using (System.IO.MemoryStream memStream = new System.IO.MemoryStream(bytes))
{
memStream.Position = 0;
System.Drawing.Image img = System.Drawing.Image.FromStream(memStream);
// must save the file while stream is open.
if (!Directory.Exists(outputPath))
Directory.CreateDirectory(outputPath);
string path = Path.Combine(outputPath, String.Format(@"{0}.jpg", pageNumber));
System.Drawing.Imaging.EncoderParameters parms = new System.Drawing.Imaging.EncoderParameters(1);
parms.Param[0] = new System.Drawing.Imaging.EncoderParameter(System.Drawing.Imaging.Encoder.Compression, 0);
// GetImageEncoder is found below this method
System.Drawing.Imaging.ImageCodecInfo jpegEncoder = GetImageEncoder("JPEG");
img.Save(path, jpegEncoder, parms);
break;
}
}
}
}
}
}
}
}
catch
{
throw;
}
finally
{
pdf.Close();
}
}
#endregion
#region GetImageEncoder
public static System.Drawing.Imaging.ImageCodecInfo GetImageEncoder(string imageType)
{
imageType = imageType.ToUpperInvariant();
foreach (ImageCodecInfo info in ImageCodecInfo.GetImageEncoders())
{
if (info.FormatDescription == imageType)
{
return info;
}
}
return null;
}
#endregion
Bear in mind the image may not exist as an image but as a set of blobs (ie raw data, colorspace data, ICC profile or colorspace) which you will need to put together. 请记住,图像可能不以图像的形式存在,而是作为一组斑点(即原始数据,色彩空间数据,ICC配置文件或色彩空间)存在,您需要将它们放在一起。 The raw image may also be manipulated in the display (ie scaled, rotated, inverted, masked, clipped). 原始图像也可以在显示器中进行操作(即缩放,旋转,倒置,蒙版,剪切)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.