简体   繁体   中英

How to extract unknown amount of email addresses / all email addresses from Word document using VBA

It's a follow up for this question and a print screen embedded there with document view still applies. The code is run from Excel VBA editor.

There is unknown number of email addresses in the Word document and:

  1. I need to extract all of them,

  2. concatenate into one string that contains all of the email addresses, separated using ", "

  3. and fill the string into Excel cell Activesheet.Range("C31")

Currently I have a code that finds the @ sign and builds email address around that. This is how it looks like:

Sub FindEmail036()         '[0-9;A-z;,._-]{1;}\@[0-9;A-z;._-]{1;}
                           '[0-9;A-z;,._-]{1;}\@[0-9;A-z;._-]{1;}
Dim WordApp As Word.Application
Dim WordDoc As Word.Document
Dim ExcelApp As Excel.Application
Dim rng As Word.Range
Dim emailAdr As String
Dim ws As Worksheet
Dim iCount As Integer

Set WordApp = GetObject(, "Word.Application")
Set ExcelApp = GetObject(, "Excel.Application")
Set WordDoc = WordApp.ActiveDocument
Set rng = WordApp.ActiveDocument.Content
Set ws = ExcelApp.ActiveSheet

ExcelApp.Application.Visible = True

    With rng.Find
        .Text = "@"
        .Wrap = wdFindContinue
        .Forward = True
        .MatchWildcards = False
        .Execute

        Debug.Print rng.Text
        If .Found = True Then
            rng.MoveStartUntil Cset:=" ", Count:=wdBackward
            Debug.Print rng.Text
            rng.MoveEndUntil Cset:=","
            Debug.Print rng.Text
            'rng.MoveEndUntil Cset:=" ", Count:=wdBackward
        End If
    End With     'how to create loop that will extract all the email addresses in the document??
    ws.Range("C31").Value = rng
End Sub

This code only extracts the first email address and is not looking for next email addresses. I know this because I'm debugging using F8 and Immediate window and I can see, this code is just finishing search after it finds @ and constructs first complete email address.

I guess some loop is necessary but I don't know how to do write it.

I've also found this source but I don't understand much from it. https://wordmvp.com/FAQs/MacrosVBA/NoTimesTextInDoc.htm

I recommend using regular expressions.

Check Reference: Microsoft VBscript Regular Expressions XX

Sub FindEmail()
    Dim WordApp As Word.Application
    Dim WordDoc As Word.Document
    Dim ExcelApp As Excel.Application

    Dim StrInput As String, sPattern As String
    Dim oEmail As MatchCollection
    Dim Ws As Worksheet
    Dim vR()
    Dim n As Long, i As Long

    Set WordApp = GetObject(, "Word.Application")
    Set ExcelApp = GetObject(, "Excel.Application")
    Set WordDoc = WordApp.ActiveDocument

    StrInput = WordDoc.Content
    Set Ws = ExcelApp.ActiveSheet

    sPattern = "([A-z0-9.]{1,})(@)([A-z0-9]{0,})(.)([A-z0-9]{1,})"

    Set oEmail = GetRegEx(StrInput, sPattern)
    For i = 0 To oEmail.Count - 1
        n = n + 1
        ReDim Preserve vR(1 To n)
        vR(n) = oEmail.Item(i)
    Next
    'Ws.Range("c31").Resize(n) = WorksheetFunction.Transpose(vR)
    Ws.Range("c31") = Join(vR, ", ") '<~~ single string
End Sub
Function GetRegEx(StrInput As String, strPattern As String) As Object
    Dim RegEx As New RegExp
    Set RegEx = New RegExp
    With RegEx
        .Global = True
        .IgnoreCase = False
        .MultiLine = True
        .Pattern = strPattern
    End With
    If RegEx.Test(StrInput) Then
        Set GetRegEx = RegEx.Execute(StrInput)
    End If
End Function

Your word document has multiple lines, so I set mutiline = true in the regex setting. The regular expression therefore stores all of its contents in matchcollection. Put this stored item in a dynamic array and do the next thing. You can store an array in multiple cells, or create a single character using the join function.

在此处输入图片说明

It's finishing because the way Range.Find works is that it sets the range equal to what it finds. So it finds the @ , sets the range equal to it, and now there are no more @ in the range. You need another range to manipulate, because manipulating your search range will only screw up your results.

You can loop with a Do While .Found = True (my preferred method). Make sure that you set .Wrap = wdFindStop or you will have an infinite loop.

I'd put the results in a dictionary.

Dim eAddresses As Object: Set eAddresses = CreateObject("Scripting.Dictionary")

Dim rng As Range
Set rng = ActiveDocument.Content
Dim srchRng As Range

Dim addressNum As Long
addressNum = 1

With rng.Find
        .Text = "@"
        .Wrap = wdFindStop
        .Forward = True
        .MatchWildcards = False
        .Execute
        Debug.Print rng.Text
        Do While .Found

            Set srchRng = rng.Duplicate
            srchRng.MoveStartUntil Cset:=" ", Count:=wdBackward
            Debug.Print srchRng.Text
            srchRng.MoveEndUntil Cset:=","

            If Not eAddresses.Exists(srchRng.Text) Then
                eAddresses.Add srchRng.Text, addressNum
                addressNum = addressNum + 1
            End If
            .Execute
        Loop
    End With

End Sub

As a side note, when you push these to production, I'd definitely pull out all the Debug.Print statements. It makes for a cluttered immediate window, especially if you plan on printing useful metrics and/or errors to the immediate window (which I recommend).

Other responders have identified the cause of your problem so I won't reiterate that. However, your requirement is a common pattern in VBA/Word, namely find something then do something as a consequence of the find (other than a replace). I generally wrap this pattern in a function or sub depending upon what action is required once the find text has been found..

If you haven't used a scripting.dictionary before than I would use early binding (as in the code below) so that you get access to intellisense for the methods and properties. This means using Tools.Reference to add the Microsoft Scripting.Runtime library to the VBIDE.

You'll see that we recalculate the end of the document each time we run through the While loop. This is good practise because we don't know in advance the impact that the find actions will have on the length of the document.

The DoEvents in the While loop ensures that you can quickly break out of the loop if things go wrong.

The function below uses a Word wildcard search to search for email addresses. The find is precise so there is no need to adjust the ends of the found range to get only the email address.

If the action in the found do loop was complicated then I would break this out to a separate function passing the found range to the function as .Duplicate. In this particular case that would also mean that I would move the scripting dictionary from a local variable to a module scope variable

Public Function GetEmailAddressesAsString(ByVal ipDoc As Word.Document) As String

    Const EmailAddress As String = "<[0-9A-Za-z._]{1,}\@[0-9A-Za-z.\_]{1,}>"

    With ipDoc.StoryRanges(wdMainTextStory)

        With .Find

            .ClearFormatting
            .Wrap = wdFindStop
            .MatchWildcards = True
            .text = EmailAddress

        End With

        Dim myAddresses As Scripting.Dictionary
        Set myAddresses = New Scripting.Dictionary

        Do While .Find.Execute

            DoEvents
            myAddresses.Add myAddresses.Count, .text
            .MoveStart Count:=.Characters.Count + 1
            .End = ipDoc.StoryRanges(wdMainTextStory).End

        Loop

    End With

    GetEmailAddressesAsString = Join(myAddresses.Items, ",")

End Function

I already effectively answered this in your other thread:

Sub Demo()
Dim wdApp As Word.Application, StrOut As String
Set wdApp = GetObject(, "Word.Application")
With wdApp.ActiveDocument.Range
  With .Find
    .ClearFormatting
    .Replacement.ClearFormatting
    .Text = "<[0-9A-ÿ.\-]{1,}\@[0-9A-ÿ\-.]{1,}([^13 -/\:-\@\\-`\{-¿])"
    .Replacement.Text = ""
    .Forward = True
    .Format = False
    .Wrap = wdFindStop
    .MatchWildcards = True
    .Execute
  End With
  Do While .Find.Found
    StrOut = StrOut & Trim(.Text) & " "
    .Collapse wdCollapseEnd
    .Find.Execute
  Loop
End With
StrOut = Replace(Trim(StrOut), " ", ", ")
ActiveSheet.Range("C31").Value = StrOut
End Sub

Note how little differs between this code and the code I posted in your other thread.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM