简体   繁体   中英

Using VBA Regex on the entire contents of a word document

Overarching Question: how do I access entire text of an RTF file

Ok, so I have a bit of a problem here, I'm hoping what I want isn't totally crazy, but here it goes.

I work with cars, and at the end of every day we compile an RTF of the vehicles we found with damages and send them off to someone. We must also keep an Excel file with these VIN numbers and the corresponding damages. I've done some work on the VIN Log part, using VBA to format certain values in certain ways. The RTF file is akin to the below formatting (these are not real VIN, but match the regex for them)

1FTEX8EEG12356789 //Other random Information I do not need
    004121 2
    012051 3
    005091
1FTFW7D78KF123567 //Other Random Information I do not need
    042071
    010341 4
    010341 9
//ETC

Here's my question: I've figured out how to open the RTF file, but how do I gain access to the entirety of the document text all at once, not just paragraph by paragraph, and does the RegExp object have available a way to capture the offset the string was found at?

The reason I am trying to use RegEx is because there's this header, which takes up roughly 10 "Paragraphs" worth of space every page (these documents can be 1 page or sometimes 10 or more). If anyone could point me into a quicker way to accomplish this, I would appreciate it.

What I was thinking I would end up having to do, once I figure out how to RegEx search the whole document was this

  1. Gather all RegExp matches for ([A-Z0-9]{17})
  2. Use the matches from Step 1 to find out the location in the document via InStr
  3. Use the value from Step 2 to loop through each set of matches from step 1, and indexes from step 2 in order to form something akin to the below code.

Code:

For i=1 To RegMatches.Count 
  start_pos = InStr(WordDocumentText,RegMatches.Item(i))
  For j=start_pos To InStr(WordDocumentText,RegMatches.Item(i+1))
    //Code to gather damages on VIN 'i'
  Next
Next

But these seems... redundant and just a sort of messy way to do so.

All I would really need to know are how to get access to the entirety of the text in the RTF file I am opening with VBA, and I can kind of go from there, but if anyone has an better idea on how to go from here with this, I'd appreciate it.

I like to use MSWord behind the scenes to read an RTF file into Excel. Here is how to get access to the entire text of an RTF document.

Sub readRTF()
    Dim wrdApp As Word.Application
    Dim wrdDoc As Word.Document
    Dim FileName As String
    Dim strFolder As String
    Dim strInput As String

    strFolder = Application.ActiveWorkbook.Path & "\"
    FileName = "VINreport.rtf"

    'open a Word instance
    Set wrdApp = CreateObject("Word.Application")
    wrdApp.Visible = False

    Set wrdDoc = wrdApp.Documents.Open(strFolder & FileName)

    'Read RTF file text into variable
    strInput = wrdDoc.Range.Text

    'Print All Text into Immediate Window
    Debug.Print strInput

    'Clean Up
    wrdDoc.Close 0
    Set wrdDoc = Nothing

    wrdApp.Quit
    Set wrdApp = Nothing
End Sub

My example RTF file was located in the same folder as the excel file and was a straight cut & paste from your example code above.

Results:

在此处输入图片说明


Now you can run whatever Regex you need against the text in strInput . If you need help with the Regex part, check out this link for some useful tips using Regex with Excel.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM