简体   繁体   English

如何找出PDF中的图像?

[英]How do I find out which images are where in a PDF?

I am using ITextSharp trying to parse a PDF I have no control over. 我正在使用ITextSharp尝试解析我无法控制的PDF。 It doesn't use AcroForm, but it does use images of checkboxes. 它不使用AcroForm,但是使用复选框的图像。 I can find and extract the two checkbox images themselves (checked and unchecked), but I cannot figure out how to get the references to those images. 我可以找到并提取两个复选框图像本身(选中和未选中),但是我无法弄清楚如何获得对这些图像的引用。

How do I find out how many times each image is referenced, and where that reference is placed? 我如何找出每个图像被引用了多少次,以及该引用的放置位置?

I found a way to get what I needed via the IRenderListener, starting with this comment: http://3d.5341.com/list/27/1036159.html 我发现了一个办法让我需要通过IRenderListener什么,开始与此评论: http://3d.5341.com/list/27/1036159.html

public static void Main()
{
    string inputStream = "whatever.pdf";

    PdfReader reader = new PdfReader(inputStream);

    var parser = new PdfReaderContentParser(reader);
    var imgScan = new FormListener();

    for (int i=1; i<=reader.NumberOfPages; ++i)
        parser.ProcessContent(1, imgScan);

    reader.Close();

    foreach (var r in imgScan.CheckBoxBoxes)
    {
        r.Dump();
    }
}

public class CheckboxBox
{
    public CheckboxBox()
    {
        this.CheckboxStates = new List<bool>();
    }

    public string Name { get;set; }
    public List<bool> CheckboxStates {get;set;}
}

public class FormListener : IRenderListener
{
    public List<CheckboxBox> CheckBoxBoxes { get;set;}
    private CheckboxBox m_CurrentCheckboxBox { get;set;}
    public FormListener()
    {
        this.CheckBoxBoxes = new List<CheckboxBox>();
        this.BeginNewBox();
    }

    public void BeginTextBlock()  {}
    public void EndTextBlock() {}

    private void BeginNewBox()
    {
        this.m_CurrentCheckboxBox = new CheckboxBox();
        this.CheckBoxBoxes.Add(this.m_CurrentCheckboxBox);
    }

    private bool IsNewBoxStarting(TextRenderInfo renderInfo)
    {
        return Regex.IsMatch(renderInfo.GetText(), @"\d+\.");
    }

    public void RenderText(TextRenderInfo renderInfo) { 
        if (this.IsNewBoxStarting(renderInfo))
            BeginNewBox();

        this.m_CurrentCheckboxBox.Name += renderInfo.GetText() + " ";
    }

    private bool GetCheckboxState(ImageRenderInfo renderInfo)
    {
        var n = renderInfo.GetRef().Number;
        return n == 21; // MagicNumberOfYesCheckboxImage;
    }

    public void RenderImage(ImageRenderInfo renderInfo)
    {
        this.m_CurrentCheckboxBox.CheckboxStates.Add(this.GetCheckboxState(renderInfo));
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM