使用 PDFSharp 在 PDF 中查找单词

Question

I am using PDFSharp.我正在使用 PDFSharp。 I need help.我需要帮助。 I need to check wether the document contains the word "abc".我需要检查文档是否包含“abc”这个词。 Example:例子：

11abcee  = true 
444abcggw = true
778ab = false

I wrote this code, but it does not work as expected:我写了这段代码，但它没有按预期工作：

    PdfDocument document = PdfReader.Open("c:\\abc.pdf");
    PdfDictionary dictionary = new PdfDictionary(document);

    string a = dictionary.Elements.GetString("MTZ");

    if (a.Equals("MTZ"))
    {
        MessageBox.Show("OK", "");
    }
    else 
    {
        MessageBox.Show("NO", "");
    }

Am I missing something?我错过了什么吗？

Answer 1

maybe this SO entry will help you: PDFSharp alter Text repositioning .也许这个 SO 条目会帮助你： PDFSharp alter Text repositioning 。 It links to here - text extraction example with PDFSharp.它链接到这里- 使用 PDFSharp 的文本提取示例。

Answer 2

Old question, but here is an example.老问题，但这里有一个例子。

Note: c# 7.0+ is required to use IS new local variable assignment.注意：使用 IS 新局部变量赋值需要 c# 7.0+。

Note: This example uses PDFSharp installed from Package Manager.注意：此示例使用从包管理器安装的 PDFSharp。 "Install-Package PdfSharp -Version 1.50.5147" “安装包 PdfSharp -版本 1.50.5147”

Note: For my requirements, I only needed to search the first page of my PDFs, update if needed.注意：对于我的要求，我只需要搜索我的 PDF 的第一页，如果需要更新。

using (PdfDocument inputDocument = PdfReader.Open(filePath, PdfDocumentOpenMode.Import))
{
    if (searchPDFPage(ContentReader.ReadContent(inputDocument.Pages[0]), searchText))
    {
        // match found.
    }
}

This code looks for a cString that starts with a pound sign, the OP would need to use a Contains string function.此代码查找以井号开头的 cString，OP 需要使用 Contains 字符串函数。

private bool searchPDFPage(CObject cObject, string searchText)
    {
        if (cObject is COperator cOperator)
        {
            if (cOperator.OpCode.Name == OpCodeName.Tj.ToString() ||
                cOperator.OpCode.Name == OpCodeName.TJ.ToString())
            {
                foreach (var cOperand in cOperator.Operands)
                {
                    if (searchPDFPage(cOperand, searchText))
                    {
                        return true;
                    }
                }
            }
        }
        else if (cObject is CSequence cSequence)
        {
            foreach (var element in cSequence)
            {
                if (searchPDFPage(element, searchText))
                {
                    return true;
                }
            }
        }
        else if (cObject is CString cString)
        {
            if (cString.Value.StartsWith("#"))
            {
                if (cString.Value.Substring(2) == searchText)
                {
                    return true;
                }
            }
        }
        return false;
    }

Credit: This example was modified based on this answer: C# Extract text from PDF using PdfSharp信用：这个例子是根据这个答案修改的： C# Extract text from PDF using PdfSharp

使用 PDFSharp 在 PDF 中查找单词

问题描述

2 个解决方案

解决方案1
1 2013-05-03 14:58:14

解决方案2
0 2021-01-06 20:06:38

使用 PDFSharp 在 PDF 中查找单词

问题描述

2 个解决方案

解决方案1 1 2013-05-03 14:58:14

解决方案2 0 2021-01-06 20:06:38

解决方案1
1 2013-05-03 14:58:14

解决方案2
0 2021-01-06 20:06:38