简体   繁体   English

在Word文档的全部内容上使用VBA Regex

[英]Using VBA Regex on the entire contents of a word document

Overarching Question: how do I access entire text of an RTF file 首要问题: 如何访问RTF文件的整个文本

Ok, so I have a bit of a problem here, I'm hoping what I want isn't totally crazy, but here it goes. 好的,所以我在这里有一个问题,我希望我想要的不是完全疯狂的,但是事情就这样了。

I work with cars, and at the end of every day we compile an RTF of the vehicles we found with damages and send them off to someone. 我从事汽车工作,每天结束时,我们都会对发现的有损坏的车辆进行RTF编译,然后将其发送给他人。 We must also keep an Excel file with these VIN numbers and the corresponding damages. 我们还必须保留包含这些VIN码和相应损坏的Excel文件。 I've done some work on the VIN Log part, using VBA to format certain values in certain ways. 我已经在VIN日志部分完成了一些工作,使用VBA以某些方式格式化某些值。 The RTF file is akin to the below formatting (these are not real VIN, but match the regex for them) RTF文件类似于以下格式(这些不是真正的VIN,但与它们的正则表达式匹配)

1FTEX8EEG12356789 //Other random Information I do not need
    004121 2
    012051 3
    005091
1FTFW7D78KF123567 //Other Random Information I do not need
    042071
    010341 4
    010341 9
//ETC

Here's my question: I've figured out how to open the RTF file, but how do I gain access to the entirety of the document text all at once, not just paragraph by paragraph, and does the RegExp object have available a way to capture the offset the string was found at? 这是我的问题:我已经找到了如何打开RTF文件的方法,但是我如何一次访问所有文档文本,而不仅是逐段访问,并且RegExp对象是否有一种捕获方法?在找到字符串的偏移量?

The reason I am trying to use RegEx is because there's this header, which takes up roughly 10 "Paragraphs" worth of space every page (these documents can be 1 page or sometimes 10 or more). 我尝试使用RegEx的原因是因为有此标头,每页大约占据10个“段落”的空间(这些文档可以是1页,有时甚至是10个或更多)。 If anyone could point me into a quicker way to accomplish this, I would appreciate it. 如果有人可以指出我要以更快的方式完成此任务,我将不胜感激。

What I was thinking I would end up having to do, once I figure out how to RegEx search the whole document was this 一旦我想到了RegEx如何搜索整个文档,我本来想做的就是

  1. Gather all RegExp matches for ([A-Z0-9]{17}) 收集([A-Z0-9] {17})的所有RegExp匹配项
  2. Use the matches from Step 1 to find out the location in the document via InStr 使用步骤1中的匹配项,通过InStr查找文档中的位置
  3. Use the value from Step 2 to loop through each set of matches from step 1, and indexes from step 2 in order to form something akin to the below code. 使用步骤2中的值循环遍历步骤1中的每个匹配项,并索引步骤2中的索引,以形成类似于以下代码的内容。

Code: 码:

For i=1 To RegMatches.Count 
  start_pos = InStr(WordDocumentText,RegMatches.Item(i))
  For j=start_pos To InStr(WordDocumentText,RegMatches.Item(i+1))
    //Code to gather damages on VIN 'i'
  Next
Next

But these seems... redundant and just a sort of messy way to do so. 但是这些似乎……多余,并且只是一种混乱的方式。

All I would really need to know are how to get access to the entirety of the text in the RTF file I am opening with VBA, and I can kind of go from there, but if anyone has an better idea on how to go from here with this, I'd appreciate it. 我真正需要知道的是如何访问使用VBA打开的RTF文件中的全部文本,我可以从那里开始,但是如果有人对如何从此处开始有了更好的了解对此,我将不胜感激。

I like to use MSWord behind the scenes to read an RTF file into Excel. 我喜欢在后台使用MSWord将RTF文件读入Excel。 Here is how to get access to the entire text of an RTF document. 这是如何访问RTF文档的整个文本。

Sub readRTF()
    Dim wrdApp As Word.Application
    Dim wrdDoc As Word.Document
    Dim FileName As String
    Dim strFolder As String
    Dim strInput As String

    strFolder = Application.ActiveWorkbook.Path & "\"
    FileName = "VINreport.rtf"

    'open a Word instance
    Set wrdApp = CreateObject("Word.Application")
    wrdApp.Visible = False

    Set wrdDoc = wrdApp.Documents.Open(strFolder & FileName)

    'Read RTF file text into variable
    strInput = wrdDoc.Range.Text

    'Print All Text into Immediate Window
    Debug.Print strInput

    'Clean Up
    wrdDoc.Close 0
    Set wrdDoc = Nothing

    wrdApp.Quit
    Set wrdApp = Nothing
End Sub

My example RTF file was located in the same folder as the excel file and was a straight cut & paste from your example code above. 我的示例RTF文件与excel文件位于同一文件夹中,并且是上述示例代码中的直接剪切和粘贴。

Results: 结果:

在此处输入图片说明


Now you can run whatever Regex you need against the text in strInput . 现在,您可以对strInput中的文本运行所需的任何正则表达式。 If you need help with the Regex part, check out this link for some useful tips using Regex with Excel. 如果您需要有关正则表达式部分的帮助,请查看此链接以获取将正则表达式与Excel结合使用的一些有用技巧。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM