简体   繁体   中英

Excel VBA - Web Scraping - Get value in HTML Table cell

I am trying to create a macro that scrapes a cargo tracking website. But I have to create 4 such macros as each airline has a different website.

I am new to VBA and web scraping.

I have put together a code that works for 1 website. But when I tried to replicate it for another one, I am stuck in the loop. I think it maybe how I am referring to the element, but like I said, I am new to VBA and have no clue about HTML.

I am trying to get the "notified" value in the highlighted line from the image.

IMAGE:"notified" text to be extracted Below is the code I have written so far that gets stuck in the loop. Any help with this would be appreciated.

Sub FlightStat_AF()

Dim url As String
Dim ie As Object
Dim nodeTable As Object

  'You can handle the parameters id and pfx in a loop to scrape dynamic numbers
  url = "https://www.afklcargo.com/mycargo/shipment/detail/057-92366691"

  'Initialize Internet Explorer, set visibility,
  'call URL and wait until page is fully loaded
  Set ie = CreateObject("InternetExplorer.Application")
  ie.Visible = False
  ie.navigate url
  Do Until ie.readyState = 4: DoEvents: Loop
  
  'Wait to load dynamic content after IE reports it's ready
  'We can do that in a loop to match the point the information is available
  Do
    On Error Resume Next
    Set nodeTable = ie.document.getElementByClassName("block-whisper")
    On Error GoTo 0
  Loop Until Not nodeTable Is Nothing
  
  'Get the status from the table
  MsgBox Trim(nodeTable.getElementsByClassName("fs-12 body-font-bold").innerText)
  
  'Clean up
  ie.Quit
  Set ie = Nothing
  Set nodeTable = Nothing
End Sub

Some basics:
For simple accesses, like the present ones, you can use the get methods of the DOM (Document Object Model). But there is an important difference between getElementByID() and getElementsByClassName() / getElementsByTagName() .

getElementByID() searches for the unique ID of a html tag. This is written as the ID attribute to html tags. If the html standard is kept by the page, there is only one element with this unique ID. That's the reason why the method begins with getElement .

If the ID is not found when using the method, VBA throws a runtime error. Therefore the call is encapsulated in the loop from the other answer from me, into switching off and on again the error handling. But in the page from this question there is no ID for the html area in question.

Instead, the required element can be accessed directly. You tried the access with getElementsByClassName() . That's right. But here comes the difference to getElementByID() .

getElementsByClassName() and getElementsByTagName() begin with getElements . Thats plural because there can be as many elements with the same class or tag name as you want. This both methods create a html node collection. All html elements with the asked class or tag name will be listet in those collections.

All elements have an index, just like an array. The indexes start at 0. To access a particular element, the desired index must be specified. The two class names fs-12 body-font-bold (class names are seperated by spaces, you can also build a node collection by using only one class name) deliver 2 html elements to the node collection. You want the second one so you must use the index 1.

This is the VBA code for the asked page by using the IE:

Sub FlightStat_AF()

Dim url As String
Dim ie As Object

  'You can handle the parameters id and pfx in a loop to scrape dynamic numbers
  url = "https://www.afklcargo.com/mycargo/shipment/detail/057-92366691"

  'Initialize Internet Explorer, set visibility,
  'call URL and wait until page is fully loaded
  Set ie = CreateObject("InternetExplorer.Application")
  ie.Visible = False
  ie.navigate url
  Do Until ie.readyState = 4: DoEvents: Loop
  
  'Wait to load dynamic content after IE reports it's ready
  'We do that with a fix manual break of a few seconds
  'because the whole page will be "reload"
  'The last three values are hours, minutes, seconds
  Application.Wait (Now + TimeSerial(0, 0, 3))
  
  'Get the status from the table
  MsgBox Trim(ie.document.getElementsByClassName("fs-12 body-font-bold")(1).innerText)
  
  'Clean up
  ie.Quit
  Set ie = Nothing
End Sub

Edit: Sub as function

This sub to test the function:

Sub testFunction()
  Dim flightStatAfResult As String
  flightStatAfResult = FlightStat_AF("057-92366691")
  MsgBox flightStatAfResult
End Sub

This is the sub as function:

Function FlightStat_AF(cargoNo As String) As String

Dim url As String
Dim ie As Object
Dim result As String

  'You can handle the parameters id and pfx in a loop to scrape dynamic numbers
  url = "https://www.afklcargo.com/mycargo/shipment/detail/" & cargoNo

  'Initialize Internet Explorer, set visibility,
  'call URL and wait until page is fully loaded
  Set ie = CreateObject("InternetExplorer.Application")
  ie.Visible = False
  ie.navigate url
  Do Until ie.readyState = 4: DoEvents: Loop
  
  'Wait to load dynamic content after IE reports it's ready
  'We do that with a fix manual break of a few seconds
  'because the whole page will be "reload"
  'The last three values are hours, minutes, seconds
  Application.Wait (Now + TimeSerial(0, 0, 3))
  
  'Get the status from the table
  result = Trim(ie.document.getElementsByClassName("fs-12 body-font-bold")(1).innerText)
  
  'Clean up
  ie.Quit
  Set ie = Nothing
  
  'Return value of the function
  FlightStat_AF = result
End Function

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM