简体   繁体   English

使用 iTextSharp 7 在 PDF 中搜索关键字

[英]Searching for a keyword in PDF using iTextSharp 7

I am trying to search for a keyword within PDF file using C# and iTextSharp.我正在尝试使用 C# 和 iTextSharp 在 PDF 文件中搜索关键字。

So I have come across this piece of code:所以我遇到了这段代码:

public List<int> ReadPdfFile(string fileName, String searthText)
        {
            List<int> pages = new List<int>();
            if (File.Exists(fileName))
            { 
                PdfReader pdfReader = new PdfReader(fileName);
                for (int page = 1; page <= pdfReader.NumberOfPages; page++)
                {
                    ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();

                    string currentPageText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
                    if (currentPageText.Contains(searthText))
                    {
                        pages.Add(page);
                    }
                }
                pdfReader.Close();
            }
            return pages;
        }

But it says that PdfReader does not contain the definition for NumberOfPages.但它说 PdfReader 不包含 NumberOfPages 的定义。 Is there any other way I can get number of pages in PDF file?有没有其他方法可以获取 PDF 文件中的页数?

You can change this你可以改变这个

pdfReader.NumberOfPages

by经过

getNumberOfPdfPages(fileName)

And the method ( reference ):以及方法( 参考):

public int getNumberOfPdfPages(string fileName)
{
    using (StreamReader sr = new StreamReader(File.OpenRead(fileName)))
    {
        Regex regex = new Regex(@"/Type\s*/Page[^s]");
        MatchCollection matches = regex.Matches(sr.ReadToEnd());

        return matches.Count;
    }
}

But it seems weird that the NumberOfPages is not recognized... Are your sure about your using ?但是 NumberOfPages 无法识别似乎很奇怪......你确定你的using吗?

The piece of code you found is for iText 5.5.x.您找到的这段代码适用于 iText 5.5.x。 iText 7 has a fundamentally changed API, so your NumberOfPages problem is not the only problem you'll have to deal with. iText 7 从根本上改变了 API,因此您的NumberOfPages问题并不是您必须处理的唯一问题。

Nonetheless: To get the number of pages in iText 7, you now use the PdfDocument method GetNumberOfPages instead of the former PdfReader property NumberOfPages .尽管如此:要获取 iText 7 中的页数,您现在使用PdfDocument方法GetNumberOfPages而不是以前的PdfReader属性NumberOfPages

And more generally, a port of your method to iText 7 might look like this:更一般地说,您的方法移植到 iText 7 可能如下所示:

public List<int> ReadPdfFile(string fileName, String searthText)
{
    List<int> pages = new List<int>();
    if (File.Exists(fileName))
    {
        using (PdfReader pdfReader = new PdfReader(fileName))
        using (PdfDocument pdfDocument = new PdfDocument(pdfReader))
        {
            for (int page = 1; page <= pdfDocument.GetNumberOfPages(); page++)
            {
                ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();

                string currentPageText = PdfTextExtractor.GetTextFromPage(pdfDocument.GetPage(page), strategy);
                if (currentPageText.Contains(searthText))
                {
                    pages.Add(page);
                }
            }
        }
    }
    return pages;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM