简体   繁体   English

使用正则表达式从Excel VBA宏中搜索Word文件

[英]Using regex to search in a word file from an excel vba macro

I have a large number of txt-files, which I want to search for specific words. 我有大量的txt文件,我想搜索特定的单词。 My approach is using an excel macro to open the txt files in word and than search each occurence of a list of words, which I provide in the excel file. 我的方法是使用excel宏以word形式打开txt文件,然后搜索每次出现的单词列表,这些都是我在excel文件中提供的。 It gives me a list, of how often each word occurs in each document. 它给了我一个清单,列出了每个单词在每个文档中出现的频率。 I have managed to do so, using the following code: 我设法做到了,使用下面的代码:

Sub CounterofWords()

Application.ScreenUpdating = False

    Dim wdApp As Word.Application
    Set wdApp = CreateObject("Word.application")
    wdApp.Visible = False

For d = 1 To 23

    Dim wdDoc As Word.Document


    FName = "C:\Users\Andreas\Desktop\test\" & Cells(d + 1, 11) & "_htm.txt"
    On Error GoTo txtdesign
    Set wdDoc = wdApp.Documents.Open(filename:=FName)

i = 15

Do While Cells(1, i) <> ""

iCount = 0
Application.ScreenUpdating = False

With wdApp.Selection.Find
 .ClearFormatting
 .Text = Cells(1, i).Value
        Do While .Execute
            iCount = iCount + 1
            wdApp.Selection.MoveRight
        Loop
End With
Cells(d + 1, i).Value = iCount

i = i + 1
Loop





wdDoc.Close savechanges:=False
Set wdDoc = Nothing

Next d

wdApp.Quit
Set wdApp = Nothing

Application.ScreenUpdating = True

Exit Sub

txtdesign:
FName = "C:\Users\Andreas\Desktop\test\" & Cells(d + 1, 11) & "_txt.txt"
Resume

End Sub

Here you can see the relevant part of my spreadsheet, where I ran the macro for the first 23 documents. 在这里,您可以看到电子表格的相关部分,在其中运行了前23个文档的宏。

Everything works fine so far. 到目前为止一切正常。 Now I want to be able to search for regular expressions. 现在,我希望能够搜索正则表达式。 I need this for example to avoid certain combinations of words in my search. 例如,我需要这样做以避免在搜索中使用某些单词组合。

It seems to be a problem that I can not write something like 我不能写这样的东西似乎是个问题

With wdApp.Selection.regex

Anyways, I don't know how to make regex work in a situation like this and appreciate your help! 无论如何,我不知道如何在这种情况下使正则表达式工作并感谢您的帮助!

The Find method in VBA has limited pattern-matching, using this flag: 使用以下标志,VBA中的Find方法具有有限的模式匹配:

Selection.Find.MatchWildcards = True

Note : your code would not get correct results as it is, because the search for each word starts where the previous one left off in the document. 注意 :您的代码将无法获得正确的结果,因为对每个单词的搜索从文档中上一个单词开始的地方开始。 You need to "move" to the top of the document for each one: 您需要将每个“移动”到文档的顶部:

Selection.HomeKey Unit:=wdStory

But if you need more complex pattern-matching using regular expressions, you'll need a different approach, using the RegExp class, after referencing "Microsoft VBScript Regular Expressions 5.5". 但是,如果需要使用正则表达式进行更复杂的模式匹配,则在引用“ Microsoft VBScript正则表达式5.5”之后,需要使用RegExp类使用其他方法。 See a great explanation in the accepted answer to this SO question: How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops . 在对此问题的公认答案中可以找到很好的解释: 如何在单元格内和循环内在Microsoft Excel中使用正则表达式(Regex)

Here's an example using regex: 这是使用正则表达式的示例:

Do While Cells(1, i) <> ""
    Application.ScreenUpdating = False
    Dim regEx As New RegExp
    Dim Matches As MatchCollection

    With regEx
        .Global = True
        .IgnoreCase = True
        .Pattern = Cells(1, i).Value
    End With

    Set Matches = regEx.Execute(wdDoc.Content.Text)
    Cells(d + 1, i).Value = Matches.Count
    i = i + 1
Loop

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM