[英]Using XMLHTTP object to parse some websites in VBA
我試圖從Wikipedia頁面上獲取“關鍵人物”字段: https : //en.wikipedia.org/wiki/Abbott_Laboratories ,並將該值復制到我的Excel電子表格中。
我設法使用xml http來做到這一點,這是我喜歡它的速度的一種方法,您可以看到下面的代碼正在工作。
但是,該代碼不夠靈活,因為Wiki頁面的結構可能會發生變化,例如,它在此頁面上不起作用: https : //en.wikipedia.org/wiki/3M
因為tr td結構並不完全相同(關鍵人物不再是3M頁面的第8個TR)
如何改善我的代碼?
Public Sub parsehtml()
Dim http As Object, html As New HTMLDocument, topics As Object, titleElem As Object, detailsElem As Object, topic As HTMLHtmlElement
Dim i As Integer
Set http = CreateObject("MSXML2.XMLHTTP")
http.Open "GET", "https://en.wikipedia.org/wiki/Abbott_Laboratories", False
http.send
html.body.innerHTML = http.responseText
Set topic = html.getElementsByTagName("tr")(8)
Set titleElem = topic.getElementsByTagName("td")(0)
ThisWorkbook.Sheets(1).Cells(1, 1).Value = titleElem.innerText
End Sub
如果“關鍵人物”的表行未固定,那么為什么不為“關鍵人物”循環表
我測試了以下修改,發現它正常工作。
在聲明部分
Dim topics As HTMLTable, Rw As HTMLTableRow
然后最后
html.body.innerHTML = http.responseText
Set topic = html.getElementsByClassName("infobox vcard")(0)
For Each Rw In topic.Rows
If Rw.Cells(0).innerText = "Key people" Then
ThisWorkbook.Sheets(1).Cells(1, 1).Value = Rw.Cells(1).innerText
Exit For
End If
Next
有更好的更快方法。 至少對於給定的URL。 匹配元素的類名,並索引返回的nodeList。 返回的項目較少,元素的路徑更短,並且與類名稱的匹配比與元素類型的匹配更快。
Option Explicit
Public Sub GetKeyPeople()
Dim html As HTMLDocument, body As String, urls(), i As Long, keyPeople
Set html = New HTMLDocument
urls = Array("https://en.wikipedia.org/wiki/Abbott_Laboratories", "https://en.wikipedia.org/wiki/3M")
With CreateObject("MSXML2.XMLHTTP")
For i = LBound(urls) To UBound(urls)
.Open "GET", urls(i), False
.send
html.body.innerHTML = .responseText
keyPeople = html.querySelectorAll(".agent").item(1).innerText
ThisWorkbook.Worksheets("Sheet1").Cells(i + 1, 1).Value = keyPeople
Next
End With
End Sub
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.