简体   繁体   English

在 C# 中搜索 word 文档

[英]searching a word document in c#

I am making a search module(windows form in C#).我正在制作一个搜索模块(C# 中的 Windows 表单)。 Its working fine for .txt files but I need to search for the word in the Word document too.它适用于 .txt 文件,但我也需要在 Word 文档中搜索该词。 i tried using Microsoft.Office.Interop.Word;我尝试使用 Microsoft.Office.Interop.Word; and the code was as below代码如下

Microsoft.Office.Interop.Word.Application app = new Microsoft.Office.Interop.Word.Application();
Microsoft.Office.Interop.Word.Document docOpen = app.Documents.Open(flname);
StreamReader srObj = new StreamReader(flname);
string read = srObj.ReadToEnd();
if (read.Contains(txtWordInput.Text)) // searching for the input word in the file
{
      count1++;
      lbSearchList.Visible = true;
      lbSearchList.Items.Add(flname);
}
srObj.Close();
app.Documents.Close();

but it at run time it gave an error that the doc file is already open hence is not accessible even when the document wasn't open.但它在运行时给出了一个错误,即 doc 文件已经打开,因此即使文档未打开也无法访问。

then i tried working simply with stream reader, it worked and did read the file but the data read was some random symbols and not what was actually written inside.然后我尝试简单地使用流阅读器,它工作并且确实读取了文件,但读取的数据是一些随机符号,而不是实际写入的内容。 Due to this the if (read.Contains(txtWordInput.Text)) statement was unable to search for the word.因此, if (read.Contains(txtWordInput.Text)) 语句无法搜索该单词。

please help me with the code as to how to successfully search for the word in the word document.请帮我提供有关如何在word文档中成功搜索单词的代码。

With that code it looks like the error was correct.使用该代码,错误似乎是正确的。 You tried opening the document twice.您尝试打开文档两次。 First with the "app.Documents.Open(flname)" line and then again right after by creating a StreamReader object with the same file name.首先使用“app.Documents.Open(flname)”行,然后再次创建具有相同文件名的 StreamReader 对象。 Also a word document is not a text file, but actually a zip file with other files inside it.同样,word 文档不是文本文件,而是实际上包含其他文件的 zip 文件。 So if you just try to use a StreamReader to read the file as text, you'll get exactly what you got...a bunch of symbols.因此,如果您只是尝试使用 StreamReader 将文件作为文本读取,您将得到您所得到的……一堆符号。

Use this method to simply read text and search for a specific string inside a Word file.使用此方法可以简单地读取文本并在 Word 文件中搜索特定字符串。 Also make sure to have the correct using statement.还要确保有正确的 using 语句。

using Word = Microsoft.Office.Interop.Word;

public static Boolean CheckWordDocumentForString(String documentLocation, String stringToSearchFor, Boolean caseSensitive = true)
{
    // Create an application object if the passed in object is null
    Word.Application winword = new Word.Application();

    // Use the application object to open our word document in ReadOnly mode
    Word.Document wordDoc = winword.Documents.Open(documentLocation, ReadOnly: true);

    // Search for our string in the document
    Boolean result;
    if (caseSensitive)
        result = wordDoc.Content.Text.IndexOf(stringToSearchFor) >= 0;
    else
        result = wordDoc.Content.Text.IndexOf(stringToSearchFor, StringComparison.CurrentCultureIgnoreCase) >= 0;

    // Close the document and the application since we're done searching
    wordDoc.Close();
    winword.Quit();

    return result;
}

Then to use the method, just call it like any other static method.然后要使用该方法,只需像任何其他静态方法一样调用它。

MyClass.CheckWordDocumentForString(@"C:\Users\CoolDude\Documents\MyWordDoc.docx", "memory", false);

Using your code it would be something more like this:使用您的代码,它会更像这样:

if (MyClass.CheckWordDocumentForString(flname, txtWordInput.Text, false))
{
    // Do something if it is found
}
else
{
    // Do something if it is not found
}

My two cents is that the srObj is completely irrelevant in this context, what you have done is bypass and ignore your docOpen and app objects, u create them but they never get used.我的两分钱是 srObj 在这种情况下完全无关紧要,您所做的是绕过并忽略您的 docOpen 和 app 对象,您创建了它们但它们从未被使用过。 I had a brief look at the API and I could tell that there are methods for getting character listings and collections of words.我简要地查看了 API,我可以看出有一些方法可以获取字符列表和单词集合。 What I think you might need to do is grab a collection of the words from your docOpen property and sift through them.我认为您可能需要做的是从您的 docOpen 属性中抓取一组单词并筛选它们。

You could use the properties docOpen.Words to get or set a collection of words, or docOpen.Text to get or set all the text as a string.您可以使用属性 docOpen.Words 来获取或设置一组单词,或者使用 docOpen.Text 来获取或设置所有文本为字符串。

As an example举个例子

Microsoft.Office.Interop.Word.Application app = new Microsoft.Office.Interop.Word.Application();
Microsoft.Office.Interop.Word.Document docOpen = app.Documents.Open(flname);
string read = docOpen.Text
if(read.Contains(txtWordInput.Text)) {
     count1++;
     lbSearchList.Visible = true;
     lbSearchList.Items.Add(flname);
}
app.Documents.Close();

I hope this helps.我希望这有帮助。

I think you can use Find function of the Interop library instead of stream.我认为您可以使用 Interop 库的 Find 函数而不是流。 You can use the following function to check whether the desired text exists in word document or not.您可以使用以下功能检查所需的文本是否存在于 Word 文档中。

    protected bool FindTextInWord(object text, string flname)
    {
        object matchCase = false;
        object matchWholeWord = true;
        object matchWildCards = false;
        object matchSoundsLike = false;
        object matchAllWordForms = false;
        object forward = true;
        object format = false;
        object matchKashida = false;
        object matchDiacritics = false;
        object matchAlefHamza = false;
        object matchControl = false;
        object read_only = false;
        object visible = true;
        object replace = 2;
        object wrap = 1;

        Microsoft.Office.Interop.Word.Application app = new Microsoft.Office.Interop.Word.Application();
        Microsoft.Office.Interop.Word.Document docOpen = app.Documents.Open(flname);
        bool val = false;
        try
        {
            val = app.Selection.Find.Execute(ref text, ref matchCase, ref matchWholeWord,
            ref matchWildCards, ref matchSoundsLike, ref matchAllWordForms, ref forward, ref wrap,
            ref format, Type.Missing, Type.Missing,
            Type.Missing, Type.Missing, Type.Missing, Type.Missing);
        }
        finally
        {
            app.Documents.Close();
        }
        return val;
    }

You can check the details of each parameter in the following link http://msdn.microsoft.com/en-us/library/office/ff193977(v=office.15).aspx您可以在以下链接中查看每个参数的详细信息http://msdn.microsoft.com/en-us/library/office/ff193977(v=office.15).aspx

You can call the function as below您可以调用如下函数

FindTextInWord((object)"Proposal","your file name here");

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM