简体   繁体   中英

Excel VBA - Web Scraping - Inner Text of HTML Table Cell

I am trying to build a macro to web scrape the status of a Cargo Shipment based on the shipment number. I am using the XML-HTTP method but I am new to VBA web scraping. I have tried to get the value by using the GetValuebyID,Tag, Class with no success.

The highlighted line is the one I need the value extracted from. [Need to Extract the 10 of 10 Delivered Value][1]

This is how far I have gotten with the code.

Sub FlightStat()

Dim XMLReq As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim AllTables As IHTMLElementCollection
Dim MainTable As IHTMLTable


XMLReq.Open "GET", "https://www.unitedcargo.com/OurNetwork/TrackingCargo1512/Tracking.jsp?id=10205436&pfx=016", False

XMLReq.send

If XMLReq.Status <> 200 Then
    MsgBox "Problem" & vbNewLine & XMLReq.Status & " - " & XMLReq.statusText
    Exit Sub
End If

HTMLDoc.body.innerHTML = XMLReq.responseText

Set AllTables = HTMLDoc.getElementsByTagID("dispTable0")

  

End Sub

I would be grateful if someone could help me get the "10 of 10 Delivered" value extracted [1]: https://i.stack.imgur.com/xcOAZ.png

Ok, like I wrote in my comment. You can scrape the status with the IE.

Please note: The following code has no timeout built in if the dynamic content cannot be loaded. There is also no check whether the number passed in the URL is correct.

Sub FlightStat()

Dim url As String
Dim ie As Object
Dim nodeTable As Object

  'You can handle the parameters id and pfx in a loop to scrape dynamic numbers
  url = "https://www.unitedcargo.com/OurNetwork/TrackingCargo1512/Tracking.jsp?id=10205436&pfx=016"

  'Initialize Internet Explorer, set visibility,
  'call URL and wait until page is fully loaded
  Set ie = CreateObject("InternetExplorer.Application")
  ie.Visible = False
  ie.navigate url
  Do Until ie.readyState = 4: DoEvents: Loop
  
  'Wait to load dynamic content after IE reports it's ready
  'We can do that in a loop to match the point the information is available
  Do
    On Error Resume Next
    Set nodeTable = ie.document.getElementByID("dispTable0")
    On Error GoTo 0
  Loop Until Not nodeTable Is Nothing
  
  'Get the status from the table
  MsgBox Trim(nodeTable.getElementsByTagName("li")(2).innertext)
  
  'Clean up
  ie.Quit
  Set ie = Nothing
  Set nodeTable = Nothing
End Sub

You can absolutely do it with xmlhttp. You just need the right endpoint to query. As it returns json you really should use a json parser, or Instr/InstrRev (for small simple string extraction). However, as I didn't want to import an external dependency (other than ticking the add reference in VBE), and the response format is standard, I went with regex. The 10 of 10 is calculated as the number of items delivered over the number of items received (start and end pieces ), along with the first statusDescription ; the latest tracking info always comes first in the string.

This will be much quicker than using a browser.

Option Explicit

Public Sub FlightStat()

    Dim XMLReq As New MSXML2.XMLHTTP60, re As VBScript_RegExp_55.RegExp 'required reference Microsoft VBScript Regular Expressions
    
    Set re = New VBScript_RegExp_55.RegExp
    
    With XMLReq
    
        .Open "GET", "https://www.unitedcargo.com/TrackingServlet?BranchCode=&CompanyName=Test&DocumentNumbers=016-10205436&UserName=&_=" & toUnix(Now()), False
        .send

        If .status <> 200 Then
            MsgBox "Problem" & vbNewLine & .status & " - " & .statusText
            Exit Sub
        End If
    
        Dim s As String, output As String, matches As VBScript_RegExp_55.MatchCollection
        
        s = .responseText
        
    End With

    With re
    
        .Pattern = """Pieces"":""(.*?)"""
        .Global = True
        
        Set matches = .Execute(s)
        
        Dim status As String
        
        .Pattern = "StatusDescription"":""(.*?)"""
        .Global = False
        status = .Execute(s)(0).SubMatches(0)
        output = matches.Item(0).SubMatches(0) & " of " & matches.Item(matches.Count - 1).SubMatches(0) & Chr$(32) & status
        
        Debug.Print output
        
    End With
End Sub

Public Function toUnix(ByVal dt As Variant) As Long
    '@TimWilliams https://stackoverflow.com/a/12326121
    toUnix = DateDiff("s", "1/1/1970", dt)
End Function

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM