[英]Using regex to search in a word file from an excel vba macro
I have a large number of txt-files, which I want to search for specific words. 我有大量的txt文件,我想搜索特定的单词。 My approach is using an excel macro to open the txt files in word and than search each occurence of a list of words, which I provide in the excel file.
我的方法是使用excel宏以word形式打开txt文件,然后搜索每次出现的单词列表,这些都是我在excel文件中提供的。 It gives me a list, of how often each word occurs in each document.
它给了我一个清单,列出了每个单词在每个文档中出现的频率。 I have managed to do so, using the following code:
我设法做到了,使用下面的代码:
Sub CounterofWords()
Application.ScreenUpdating = False
Dim wdApp As Word.Application
Set wdApp = CreateObject("Word.application")
wdApp.Visible = False
For d = 1 To 23
Dim wdDoc As Word.Document
FName = "C:\Users\Andreas\Desktop\test\" & Cells(d + 1, 11) & "_htm.txt"
On Error GoTo txtdesign
Set wdDoc = wdApp.Documents.Open(filename:=FName)
i = 15
Do While Cells(1, i) <> ""
iCount = 0
Application.ScreenUpdating = False
With wdApp.Selection.Find
.ClearFormatting
.Text = Cells(1, i).Value
Do While .Execute
iCount = iCount + 1
wdApp.Selection.MoveRight
Loop
End With
Cells(d + 1, i).Value = iCount
i = i + 1
Loop
wdDoc.Close savechanges:=False
Set wdDoc = Nothing
Next d
wdApp.Quit
Set wdApp = Nothing
Application.ScreenUpdating = True
Exit Sub
txtdesign:
FName = "C:\Users\Andreas\Desktop\test\" & Cells(d + 1, 11) & "_txt.txt"
Resume
End Sub
Here you can see the relevant part of my spreadsheet, where I ran the macro for the first 23 documents. 在这里,您可以看到电子表格的相关部分,在其中运行了前23个文档的宏。
Everything works fine so far. 到目前为止一切正常。 Now I want to be able to search for regular expressions.
现在,我希望能够搜索正则表达式。 I need this for example to avoid certain combinations of words in my search.
例如,我需要这样做以避免在搜索中使用某些单词组合。
It seems to be a problem that I can not write something like 我不能写这样的东西似乎是个问题
With wdApp.Selection.regex
Anyways, I don't know how to make regex work in a situation like this and appreciate your help! 无论如何,我不知道如何在这种情况下使正则表达式工作并感谢您的帮助!
The Find method in VBA has limited pattern-matching, using this flag: 使用以下标志,VBA中的Find方法具有有限的模式匹配:
Selection.Find.MatchWildcards = True
Note : your code would not get correct results as it is, because the search for each word starts where the previous one left off in the document. 注意 :您的代码将无法获得正确的结果,因为对每个单词的搜索从文档中上一个单词开始的地方开始。 You need to "move" to the top of the document for each one:
您需要将每个“移动”到文档的顶部:
Selection.HomeKey Unit:=wdStory
But if you need more complex pattern-matching using regular expressions, you'll need a different approach, using the RegExp
class, after referencing "Microsoft VBScript Regular Expressions 5.5". 但是,如果需要使用正则表达式进行更复杂的模式匹配,则在引用“ Microsoft VBScript正则表达式5.5”之后,需要使用
RegExp
类使用其他方法。 See a great explanation in the accepted answer to this SO question: How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops . 在对此问题的公认答案中可以找到很好的解释: 如何在单元格内和循环内在Microsoft Excel中使用正则表达式(Regex) 。
Here's an example using regex: 这是使用正则表达式的示例:
Do While Cells(1, i) <> ""
Application.ScreenUpdating = False
Dim regEx As New RegExp
Dim Matches As MatchCollection
With regEx
.Global = True
.IgnoreCase = True
.Pattern = Cells(1, i).Value
End With
Set Matches = regEx.Execute(wdDoc.Content.Text)
Cells(d + 1, i).Value = Matches.Count
i = i + 1
Loop
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.