简体   繁体   English

使用C#使用iTextsharp突出显示现有PDF的文本(颜色)

[英]Highlighting text ( colors ) of existing PDF using iTextsharp using C#

I would like know whether we can highlight text (colors) of already created PDF using itextsharp ? 我想知道我们是否可以使用itextsharp突出显示已创建PDF文本(颜色)?

I see examples like creating a new PDF, while doing so we can apply colors. 我看到了创建新PDF等示例,同时我们可以应用颜色。 I am looking for where I can get chunks of text from PDF and apply colors and save it. 我正在寻找可以从PDF获取大量文本并应用颜色并保存的地方。

Here is the thing I am trying to accomplish, read a PDF file, parse text and highlight text based on business rules. 这是我想要完成的事情,阅读PDF文件,解析文本并根据业务规则突出显示文本。

Any third party dll suggestion also works, as a first step I am looking in to opensource iTextsharp library . 任何第三方dll建议也有效,作为我正在寻找opensource iTextsharp library

Yes you can highlight text but you will have to work for it unfortunately. 是的,你可以突出显示文字,但不幸的是你必须为它工作。 What looks like a highlight is a PDF Text Markup Annotation as far as the spec is considered. 在考虑规范的情况下,看起来像高亮的是PDF文本标记注释。 That part is pretty easy. 那部分很简单。 The hard part is figuring out the coordinates to apply the annotation to. 困难的部分是找出应用注释的坐标。

Here's the simple code for creating a highlight using an existing PdfStamper called stamper : 这是使用名为stamper的现有PdfStamper创建突出显示的简单代码:

PdfAnnotation highlight = PdfAnnotation.CreateMarkup(stamper.Writer, rect, null, PdfAnnotation.MARKUP_HIGHLIGHT, quad);

Once you have the highlight you can set the color using: 一旦你有突出显示,你可以使用以下方法设置颜色:

highlight.Color = BaseColor.YELLOW;

And then add it to your stamper on page 1 using: 然后使用以下命令将其添加到第1页的stamper

stamper.AddAnnotation(highlight,1);

Technically the rect parameter doesn't actually get used (as far as I can tell) and instead gets overridden by the quad parameter. 从技术上讲, rect参数实际上并没有被使用(据我所知),而是被quad参数覆盖。 The quad parameter is an array of x,y coords that essentially represent the corners of a rectangle (technically quadrilateral). quad参数是x,y坐标的数组,其基本上表示矩形的角(技术上为四边形)。 The spec says they start in the bottom left and go counter-clockwise but in reality they appear to go bottom left to bottom right to top left to top right. 规范说他们从左下角开始逆时针走,但实际上它们似乎是从左下到右下到左上到右上。 Calculating the quad is a pain so instead its just easier to create a rectangle and create the quad from it: 计算四边形是一种痛苦,所以更容易创建一个矩形并从中创建四边形:

iTextSharp.text.Rectangle rect = new iTextSharp.text.Rectangle(60.6755f, 749.172f, 94.0195f, 735.3f);
float[] quad = { rect.Left, rect.Bottom, rect.Right, rect.Bottom, rect.Left, rect.Top, rect.Right, rect.Top };

So how do you get the rectangle of existing text in the first place? 那么如何首先获得现有文本的矩形? For that you need to look at TextExtractionStrategy and PdfTextExtractor . 为此,您需要查看TextExtractionStrategyPdfTextExtractor There's a lot to go into so I'm going to start by pointing you at this post which has some further posts linked. 还有很多事情要做,所以我将首先指出你的帖子 ,其中有一些链接的帖子。

Below is a full working C# 2010 WinForms app targeting iTextSharp 5.1.1.2 that shows off the creation of a simple PDF and the highlighting of part of the text using hard-coded coordinates. 下面是一个针对iTextSharp 5.1.1.2的全功能C#2010 WinForms应用程序,该应用程序展示了简单PDF的创建以及使用硬编码坐标突出显示部分文本。 If you need help calculating these coordinates start with the link above and then ask any questions! 如果您需要帮助计算这些坐标,请从上面的链接开始,然后提出任何问题!

using System;
using System.ComponentModel;
using System.Data;
using System.Text;
using System.Windows.Forms;
using System.IO;
using iTextSharp.text;
using iTextSharp.text.pdf;

namespace WindowsFormsApplication1
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

        private void Form1_Load(object sender, EventArgs e)
        {
            //Create a simple test file
            string outputFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Test.pdf");

            using (FileStream fs = new FileStream(outputFile, FileMode.Create, FileAccess.Write, FileShare.None))
            {
                using (Document doc = new Document(PageSize.LETTER))
                {
                    using (PdfWriter w = PdfWriter.GetInstance(doc, fs))
                    {
                        doc.Open();
                        doc.Add(new Paragraph("This is a test"));
                        doc.Close();
                    }
                }
            }

            //Create a new file from our test file with highlighting
            string highLightFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Highlighted.pdf");

            //Bind a reader and stamper to our test PDF
            PdfReader reader = new PdfReader(outputFile);

            using (FileStream fs = new FileStream(highLightFile, FileMode.Create, FileAccess.Write, FileShare.None))
            {
                using (PdfStamper stamper = new PdfStamper(reader, fs))
                {
                    //Create a rectangle for the highlight. NOTE: Technically this isn't used but it helps with the quadpoint calculation
                    iTextSharp.text.Rectangle rect = new iTextSharp.text.Rectangle(60.6755f, 749.172f, 94.0195f, 735.3f);
                    //Create an array of quad points based on that rectangle. NOTE: The order below doesn't appear to match the actual spec but is what Acrobat produces
                    float[] quad = { rect.Left, rect.Bottom, rect.Right, rect.Bottom, rect.Left, rect.Top, rect.Right, rect.Top };

                    //Create our hightlight
                    PdfAnnotation highlight = PdfAnnotation.CreateMarkup(stamper.Writer, rect, null, PdfAnnotation.MARKUP_HIGHLIGHT, quad);

                    //Set the color
                    highlight.Color = BaseColor.YELLOW;

                    //Add the annotation
                    stamper.AddAnnotation(highlight,1);
                }
            }

            this.Close();
        }
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM