简体   繁体   中英

Using regex to search in a word file from an excel vba macro

I have a large number of txt-files, which I want to search for specific words. My approach is using an excel macro to open the txt files in word and than search each occurence of a list of words, which I provide in the excel file. It gives me a list, of how often each word occurs in each document. I have managed to do so, using the following code:

Sub CounterofWords()

Application.ScreenUpdating = False

    Dim wdApp As Word.Application
    Set wdApp = CreateObject("Word.application")
    wdApp.Visible = False

For d = 1 To 23

    Dim wdDoc As Word.Document


    FName = "C:\Users\Andreas\Desktop\test\" & Cells(d + 1, 11) & "_htm.txt"
    On Error GoTo txtdesign
    Set wdDoc = wdApp.Documents.Open(filename:=FName)

i = 15

Do While Cells(1, i) <> ""

iCount = 0
Application.ScreenUpdating = False

With wdApp.Selection.Find
 .ClearFormatting
 .Text = Cells(1, i).Value
        Do While .Execute
            iCount = iCount + 1
            wdApp.Selection.MoveRight
        Loop
End With
Cells(d + 1, i).Value = iCount

i = i + 1
Loop





wdDoc.Close savechanges:=False
Set wdDoc = Nothing

Next d

wdApp.Quit
Set wdApp = Nothing

Application.ScreenUpdating = True

Exit Sub

txtdesign:
FName = "C:\Users\Andreas\Desktop\test\" & Cells(d + 1, 11) & "_txt.txt"
Resume

End Sub

Here you can see the relevant part of my spreadsheet, where I ran the macro for the first 23 documents.

Everything works fine so far. Now I want to be able to search for regular expressions. I need this for example to avoid certain combinations of words in my search.

It seems to be a problem that I can not write something like

With wdApp.Selection.regex

Anyways, I don't know how to make regex work in a situation like this and appreciate your help!

The Find method in VBA has limited pattern-matching, using this flag:

Selection.Find.MatchWildcards = True

Note : your code would not get correct results as it is, because the search for each word starts where the previous one left off in the document. You need to "move" to the top of the document for each one:

Selection.HomeKey Unit:=wdStory

But if you need more complex pattern-matching using regular expressions, you'll need a different approach, using the RegExp class, after referencing "Microsoft VBScript Regular Expressions 5.5". See a great explanation in the accepted answer to this SO question: How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops .

Here's an example using regex:

Do While Cells(1, i) <> ""
    Application.ScreenUpdating = False
    Dim regEx As New RegExp
    Dim Matches As MatchCollection

    With regEx
        .Global = True
        .IgnoreCase = True
        .Pattern = Cells(1, i).Value
    End With

    Set Matches = regEx.Execute(wdDoc.Content.Text)
    Cells(d + 1, i).Value = Matches.Count
    i = i + 1
Loop

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM