简体   繁体   中英

Extracting data from website using VBA

I want to extract the projectstatus of a project which I can find on a website. See below for an example how the html is parsed. I want to extract the text Start which is the text between td and /td. See below the html my code.

 <div id="ProjectStatus">
 <tr>
 <th>
 <span id="ProjectStatus_Label1" title="De status van het project">Projectstatus</span>
 </th>
 <td>Start</td>
 </tr>

Below you'll find the code that I have at this moment. This code only gives me the string "Projectstatus", which is not what I want. How can I extract the word "Start"?

Private Sub btnClick()

Dim ieApp As InternetExplorer
Set ieApp = New InternetExplorer
Set ieApp = CreateObject("internetexplorer.application")

With ieApp
 .Navigate "url"
 .Visible = True
End With

Do While ieApp.Busy
    DoEvents
Loop 

Set getStatus = ieApp.Document.getElementById("ProjectStatus_Label1")

strStatus = getStatus.innerText

MsgBox (strStatus) 'gives met the text "Projectstatus, but I need the text "Start"

ieApp.Quit
Set ieApp = Nothing

End Sub

Achieving this, starting from the ProjectStatus_Label1 , will require some DOM navigation.

Use the following:

Do While ieApp.Busy
    DoEvents
Loop
Set labelSpan = ieApp.Document.getElementById("ProjectStatus_Label1")
Set tableHeader = labelSpan.Parent
Set tableRow = tableHeader.Parent
For Each child In tableRow.Children
    If child.tagName = "TD" 'This is the element you're looking for
         Debug.Print child.innerText
         Exit For
    End If
Next

Of course, I highly recommend you revise this code and use explicit declarations and Option Explicit , but you haven't in your question so I won't in my answer.

Also, I've used a number of assignments (labelSpan, tableHeader) for demonstrative purposes. You can use Set tableRow = ieApp.Document.getElementById("ProjectStatus_Label1").Parent.Parent and remove those other declarations.

Or you can use the code-golfy, harder-to-understand approach, starting from the ProjectStatus div:

Debug.Print ieApp.Document.getElementById("ProjectStatus").GetElementsByTagName("td")(0).innerText

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM