简体   繁体   English

C# [itext7] GetTextFromPage 附加每个页面

[英]C# [itext7] GetTextFromPage appends each page

I am not sure what I am doing wrong here.我不确定我在这里做错了什么。

While looping through the pages of a PDF - I get the page content.在循环浏览 PDF 的页面时 - 我得到了页面内容。 For example:例如:

Page 1 = 1第 1 页 = 1

Page 2 = 2第 2 页 = 2

Page 3 = 3第 3 页 = 3

The code:编码:

PdfReader pdfReader = new PdfReader(filename);
PdfDocument pdfDoc = new PdfDocument(pdfReader);
var strategy = new SimpleTextExtractionStrategy();
for (int page = 1; page <= pdfDoc.GetNumberOfPages(); page++)
{
    try
    {
        string pageContent = PdfTextExtractor.GetTextFromPage(pdfDoc.GetPage(page), strategy);
        // do stuff with pageContent
    }
}

The output: output:

First loop = Page 1 = 1第一个循环 = 第 1 页 = 1

Second loop = Page 1 = 1, Page 2 = 2第二个循环 = 第 1 页 = 1,第 2 页 = 2

Third loop = Page 1 = 1, Page 2 = 2, Page 3 = 3第三个循环 = 第 1 页 = 1,第 2 页 = 2,第 3 页 = 3

I moved pageContent out of the loop and added this code prior to the try statement:我将 pageContent 移出循环并在 try 语句之前添加了以下代码:

pageContent = "";

I stepped through, and the pageContent is "" on the second loop.我通过了,第二个循环中的 pageContent 是“”。 Yet after GetTextFromPage - it is both the first and second page of text (on second loop).然而在 GetTextFromPage 之后 - 它是文本的第一页和第二页(在第二个循环中)。

This has occured on a variety of PDFs, so figure it is my code not the PDF in question.这发生在各种 PDF 上,所以认为这是我的代码,而不是有问题的 PDF。

I spotted the issue - though I don't think this should be an issue...我发现了这个问题 - 虽然我认为这不应该是一个问题......

PdfReader pdfReader = new PdfReader(filename);
PdfDocument pdfDoc = new PdfDocument(pdfReader);
for (int page = 1; page <= pdfDoc.GetNumberOfPages(); page++)
{
    try
    {
        var strategy = new SimpleTextExtractionStrategy();
        string pageContent = PdfTextExtractor.GetTextFromPage(pdfDoc.GetPage(page), strategy);
        // do stuff with pageContent
    }
}

Strategy has to be within the Try function - once placed there, it returns just the requested page - and does not append them.策略必须在 Try function 内——一旦放置在那里,它只返回请求的页面——而不是 append 它们。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM