简体   繁体   中英

VB.Net: Searching Word Document By Line

I'm attempting to read through a Word Document (800+ pages) line by line, and if that line contains certain text, in this case Section , simply print that line to console.

Public Sub doIt()
    SearchFile("theFilePath", "Section")
    Console.WriteLine("SHit")
End Sub

Public Sub SearchFile(ByVal strFilePath As String, ByVal strSearchTerm As String)
    Dim sr As StreamReader = New StreamReader(strFilePath)
    Dim strLine As String = String.Empty

    For Each line As String In sr.ReadLine
        If line.Contains(strSearchTerm) = True Then
            Console.WriteLine(line)
        End If
    Next

End Sub

It runs, but it doesn't print out anything. I know the word "Section" is in there multiple times as well.

As already mentioned in the comments, you can't search a Word document the way you are currently doing. You need to create a Word.Application object as mentioned and then load the document so you can search it.

Here is a short example I wrote for you. Please note, you need to add reference to Microsoft.Office.Interop.Word and then you need to add the import statement to your class. For example Imports Microsoft.Office.Interop . Also this grabs each paragraph and then uses the range to look for the word you are searching for, if found it adds it to the list.

Note: Tried and tested - I had this in a button event, but put where you need it.

    Try
                Dim objWordApp As Word.Application = Nothing
                Dim objDoc As Word.Document = Nothing
                Dim TextToFind As String = YOURTEXT
                Dim TextRange As Word.Range = Nothing
                Dim StringLines As New List(Of String)

                objWordApp = CreateObject("Word.Application")

                If objWordApp IsNot Nothing Then
                    objWordApp.Visible = False
                    objDoc = objWordApp.Documents.Open(FileName, )
                End If

                If objDoc IsNot Nothing Then

                    'loop through each paragraph in the document and get the range
                    For Each p As Word.Paragraph In objDoc.Paragraphs
                        TextRange = p.Range
                        TextRange.Find.ClearFormatting()

                        If TextRange.Find.Execute(TextToFind, ) Then
                            StringLines.Add(p.Range.Text)
                        End If
                    Next

                    If StringLines.Count > 0 Then
                        MessageBox.Show(String.Join(Environment.NewLine, StringLines.ToArray()))
                    End If

                    objDoc.Close()
                    objWordApp.Quit()

                End If


            Catch ex As Exception
                'publish your exception?
            End Try

Update to use Sentences - this will go through each paragraph and grab each sentence, then we can see if the word exists... The benefit of this is it's quicker because we get each paragraph and then search the sentences. We have to get the paragraph in order to get the sentences...

Try
            Dim objWordApp As Word.Application = Nothing
            Dim objDoc As Word.Document = Nothing
            Dim TextToFind As String = "YOUR TEXT TO FIND"
            Dim TextRange As Word.Range = Nothing
            Dim StringLines As New List(Of String)
            Dim SentenceCount As Integer = 0

            objWordApp = CreateObject("Word.Application")

            If objWordApp IsNot Nothing Then
                objWordApp.Visible = False
                objDoc = objWordApp.Documents.Open(FileName, )
            End If

            If objDoc IsNot Nothing Then

                For Each p As Word.Paragraph In objDoc.Paragraphs
                    TextRange = p.Range
                    TextRange.Find.ClearFormatting()
                    SentenceCount = TextRange.Sentences.Count
                    If SentenceCount > 0 Then
                        Do Until SentenceCount = 0
                            Dim sentence As String = TextRange.Sentences.Item(SentenceCount).Text
                            If sentence.Contains(TextToFind) Then
                                StringLines.Add(sentence.Trim())
                            End If

                            SentenceCount -= 1
                        Loop
                    End If
                Next

                If StringLines.Count > 0 Then
                    MessageBox.Show(String.Join(Environment.NewLine, StringLines.ToArray()))
                End If

                objDoc.Close()
                objWordApp.Quit()

            End If


        Catch ex As Exception
            'publish your exception?
        End Try

Here's a sub that will print each line that the search-string is found on, rather than each paragraph. It will mimic the behavior of using the streamreader in your example to read/check each line:

'Add reference to and import Microsoft.Office.Interop.Word
Public Sub SearchFile(ByVal strFilePath As String, ByVal strSearchTerm As String)
    Dim wordObject As Word.Application = New Word.Application
    wordObject.Visible = False
    Dim objWord As Word.Document = wordObject.Documents.Open(strFilePath)
    objWord.Characters(1).Select()

    Dim bolEOF As Boolean = False
    Do Until bolEOF
        wordObject.Selection.MoveEnd(WdUnits.wdLine, 1)
        If wordObject.Selection.Text.ToUpper.Contains(strSearchTerm.ToUpper) Then
            Console.WriteLine(wordObject.Selection.Text.Replace(vbCr, "").Replace(vbCr, "").Replace(vbCrLf, ""))
        End If
        wordObject.Selection.Collapse(WdCollapseDirection.wdCollapseEnd)
        If wordObject.Selection.Bookmarks.Exists("\EndOfDoc") Then
            bolEOF = True
        End If
    Loop

    objWord.Close()
    wordObject.Quit()
    objWord = Nothing
    wordObject = Nothing
    Me.Close()
End Sub

It is a slightly modified vb.net implementation of nawfal's solution to parsing word document lines

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM