简体   繁体   中英

HTML table data can not be extracted

I am trying to pull some data from a HTML table, however I keep getting an ERROR Message 13 "Type Mismatch" . It may be because i am using the wrong html tag. I have been struggling on this for a few days now, as I can not work out the correct process.

This is the HTML that I am using

网页代码

The Code

Dim htmlTable As Object
Dim collTD As Collection
Dim oNode As Object
Dim IE As Object

Set IE = CreateObject("InternetExplorer.application")
With IE
    .Visible = True
    .navigate "https://ukonlinestores.co.uk/amazon-uk-sellers/"

' Wait for the page to fully load; you can't do anything if the page is not fully loaded
Do While .Busy Or _
    .readyState <> 4
    DoEvents
Loop

Set htmlTable = IE.document.getElementById("table_1")
Set collTD = htmlTable.getElementsByTagName("td")

For Each oNode In collTD
    .Range("A" & RowCount) = oNode.innerHTML
     RowCount = RowCount + 1
    
Next oNode

End With

代码错误

UPDATE AS OF TODAY MON 12TH OCTOBER 2020

I have made a few changes to the code, as I wish to navigate the paginate table and extract the data off per page. I know the Pagination code that I added works as I am currently using it on another code, my problem is that it is not navigating the pages, possible due to wrong class or ID.

WorkSheet工作表图片

HTML for NEXT page

下一页 HTML

New Code, Updated

Dim htmlTable As Object Dim collTD As Object Dim oNode As Object Dim IE As Object Dim RowCount As Long Dim currentColumn As Long ''''' NEW ''''''' Dim HTMLdoc As Object Dim nextPageElements As Object Dim pageNumber As Long '''' START ROW 5 '''''' RowCount = 5 currentColumn = 1 Set IE = CreateObject("Inte.netExplorer.application") With IE.Visible = True.navigate "https://ukonlinestores.co.uk/amazon-uk-sellers/" ' Wait for the page to fully load; you can't do anything if the page is not fully loaded Do While.readyState <> 4: DoEvents: Loop Application.Wait (Now + TimeSerial(0, 0, 2)) StartForLoop_Restart: Set htmlTable =.document.getElementById("table_1") Set collTD = htmlTable.getElementsByTagName("td") For Each oNode In collTD If currentColumn Mod 11 = 0 Then RowCount = RowCount + 1 currentColumn = 1 End If Cells(RowCount, currentColumn) = oNode.innerHTML 'Change to inner HTML to pull of url currentColumn = currentColumn + 1 Next oNode ''''' ######### NEXT PAGE AND DELAYS ###### added by me Do DoEvents ''''' Searches Number of Pages entered in Sheet3 A2 If pageNumber >= Replace(Worksheets("Sheet3").Range("A2").Value, " ", "+") Then Exit Do On Error Resume Next ''' ################### CHANGE THE CLASS OF THE NEXT PAGE ########################## ERROR IS HERE #### Set nextPageElements = HTMLdoc.getElementsByClassName("paginate_button next")(0) 'Set nextPageElements = HTMLdoc.getElementsID("table_1_next ")(0) If nextPageElements Is Nothing Then Exit Do ''' ################### CHANGE THE CLASS OF THE NEXT PAGE ########################## ERROR IS ABOVE #### ''''' Scrolls Down the Browser IE.document.parentWindow.Scroll 0&, 99999 '''''Random delay from Max number entered in Application.Wait Now + TimeSerial(0, 0, Application.RandBetween(1, Worksheets("Sheet3").Range("B2").Value)) nextPageElements.Click 'next web page Do While IE.Busy Or IE.readyState <> 4 DoEvents Loop ''''Second Random delay from Max number entered Application.Wait Now + TimeSerial(0, 0, Application.RandBetween(1, Worksheets("Sheet3").Range("C2").Value)) Set HTMLdoc = IE.document pageNumber = pageNumber + 1 GoTo StartForLoop_Restart Loop End With IE.Quit Set IE = Nothing Set HTMLdoc = Nothing Set nextPageElements = Nothing MsgBox "All Done"

Also need

Currently I don't thing the next page results are copied and pasted under the previous page results. It should be something like this. Color is only for your reff.

  • Green = Page1 Results
  • Red = Page2 Results
  • Blue = Page3 Results and so on as shown below.

结果布局

You should properly declare your objects as IHTML elements in order to manipulate them and use their methods.

First add a reference to Microsoft HTML Object Library

Your code becomes

Sub test()

Dim htmlTable As IHTMLElement
Dim collTD As IHTMLElementCollection
Dim oNode As IHTMLElement
Dim IE As Object
Dim RowCount As Integer

Set IE = CreateObject("InternetExplorer.application")
With IE
    .Visible = True
    .navigate "https://ukonlinestores.co.uk/amazon-uk-sellers/"

' Wait for the page to fully load; you can't do anything if the page is not fully loaded
Do While .Busy Or _
    .readyState <> 4
    DoEvents
Loop

Set htmlTable = IE.document.getElementById("table_1")
Set collTD = htmlTable.getElementsByTagName("td")

RowCount = 1

For Each oNode In collTD
     ThisWorkbook.Worksheets(1).Range("A" & RowCount) = oNode.innerHTML
     RowCount = RowCount + 1
    
Next oNode

End With


End Sub

With late binding you don't need a reference.

Sub Test()

Dim htmlTable As Object
Dim collTD As Object
Dim oNode As Object
Dim IE As Object
Dim RowCount As Long
Dim currentColumn As Long

  RowCount = 1
  currentColumn = 1
  
  Set IE = CreateObject("InternetExplorer.application")
  With IE
    .Visible = True
    .navigate "https://ukonlinestores.co.uk/amazon-uk-sellers/"
    
    ' Wait for the page to fully load; you can't do anything if the page is not fully loaded
    Do While .readyState <> 4: DoEvents: Loop
    Application.Wait (Now + TimeSerial(0, 0, 2))
    
    Set htmlTable = .document.getElementById("table_1")
    Set collTD = htmlTable.getElementsByTagName("td")
    
    For Each oNode In collTD
      If currentColumn Mod 11 = 0 Then
        RowCount = RowCount + 1
        currentColumn = 1
      End If
      
      Cells(RowCount, currentColumn) = oNode.innertext
      currentColumn = currentColumn + 1
    Next oNode
  End With
End Sub

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM