[英]VBA Scrape Content Generated by JavaScript
我希望從使用 VBA 的 Excel 工作表中定義的 URL 獲取推薦的客戶定價信息。 這些值位於 Excel 中的 Cells(i,11) 中,它們都指向https://ark.intel.com上的特定頁面。 值從第 5 行開始。
例如,如果我想找到英特爾至強 8268 的價格,我會導航到https://ark.intel.com/content/www/us/en/ark/products/192481/intel-xeon-platinum-8268 -processor-35-75m-cache-2-90-ghz.html 。 如果查看源代碼,很明顯此內容是用 JavaScript 生成的,因此我改為在 Firefox 網絡瀏覽器上使用“檢查元素”選項。
從這里,我可以向下導航並在標簽中找到我要查找的內容。 見下圖:
我無法捕獲該值並將其寫入 excel 列,即列 E。以下是我所做的一次嘗試:
Sub ProcessorPricing()
Dim URL As String, lastRow As Long
Dim XMLHTTP As Object, HTML As Object, objResult As Object, Price As Object
lastRow = Range("A" & Rows.Count).End(xlUp).row
Dim cookie As String
Dim result_cookie As String
For i = 5 To lastRow
If Cells(i, 1) <> "" Then
URL = Cells(i, 11)
Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
XMLHTTP.Open "GET", URL, False
XMLHTTP.setRequestHeader "Content-Type", "text/xml"
XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
XMLHTTP.send
Set HTML = CreateObject("htmlfile")
HTML.body.innerHTML = XMLHTTP.responseText
Set objResult = html.getElementsByID("bladeInside")
Set Price = objResult.getElementsByTagName("span")(0)
Cells(i, 5) = Price.Value
DoEvents
End If
Next
End Sub
任何幫助將不勝感激。
PS - 我也試過在https://www.myonlinetraininghub.com/web-scraping-with-vba找到的代碼也無濟於事
更新:
能夠在您的幫助下完成所有工作。 謝謝伯特蘭·馬特爾和斯塔夫羅斯·喬恩。
這是整個腳本:
Sub UpdateProcessorInfo()
'requirements: JSON Parser installation needs to be added to project - https://github.com/VBA-tools/VBA-JSON - (Download latest release -> Import JsonConverter.bas -> File -> Import File)
'requirements: Windows only, include Reference to "Microsoft Scripting Runtime" (Tools -> References -> Check Microsoft Scripting Runtime)
'requirements: Add a refernce to Microsoft WinHTTP Services 5.1. (Tools -> References -> Check Microsoft WinHTTP Services 5.1)
Dim Connection As WorkbookConnection
Dim url As String, lastRow As Long
Dim XMLHTTP As Object, html As Object, objResultDiv As Object, link As Object
Dim cookie As String
Dim result_cookie As String
Dim req As New WinHttpRequest
Dim ids As String
Dim responseJSON As Object
For Each Connection In ThisWorkbook.Connections
Connection.Refresh
Next Connection
Worksheets("Processor_DB_Intel").Range("A2:A1000").Copy
Worksheets("Processor Comparisons").Range("A5").PasteSpecial Paste:=xlPasteValues
lastRow = Range("A" & Rows.Count).End(xlUp).row
Range("k5:k300").Clear
For i = 5 To lastRow
If Cells(i, 1) <> "" Then
url = "https://www.google.com/search?q=" & "site:ark.intel.com " & Cells(i, 1) & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)
Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
XMLHTTP.Open "GET", url, False
XMLHTTP.setRequestHeader "Content-Type", "text/xml"
XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
XMLHTTP.send
Set html = CreateObject("htmlfile")
html.body.innerHTML = XMLHTTP.responseText
Set objResultDiv = html.getElementById("rso")
Set link = objResultDiv.getElementsByTagName("a")(0)
Cells(i, 11) = link
DoEvents
End If
Next
lastRow = Range("A" & Rows.Count).End(xlUp).row
For i = 5 To lastRow
ids = Cells(i, 13)
url = "https://ark.intel.com/libs/apps/intel/support/ark/recommendedCustomerPrice?ids=" & ids & "&siteName=ark"
If Cells(i, 1) <> "" Then
With req
.Open "GET", url, False
.send
Set responseJSON = JsonConverter.ParseJson(.responseText)
End With
On Error Resume Next
'Debug.Print responseJSON(1)("displayPrice")
Cells(i, 14) = responseJSON(1)("displayPrice")
End If
Next
結束子
正如@Bertrand Martel 指出的那樣,您可以使用一個非常方便的 API 來獲取所需的信息。
為了進一步詳細說明他的回答,並且由於您無法從 JSON 響應中提取價格,這是我的兩分錢。
您需要將此JSON 解析器添加到您的項目中。 按照鏈接中的安裝說明進行操作。
響應的結構如下所示:
所以這一切都歸結為:
Option Explicit
Sub intel()
Dim req As New WinHttpRequest 'add a reference to Microsoft WinHTTP Services 5.1. MSXML2 works fine as well
Dim url As String, ids As String
Dim responseJSON As Object
ids = "192481"
url = "https://ark.intel.com/libs/apps/intel/support/ark/recommendedCustomerPrice?ids=" & ids & "&siteName=ark"
With req
.Open "GET", url, False
.send
Set responseJSON = JsonConverter.ParseJson(.responseText)
End With
Debug.Print responseJSON(1)("displayPrice") 'For demonstration purposes the price is printed in the immediate window
End Sub
正如您所注意到的,數據並未嵌入到 html 中,而是使用外部 JSON API 通過 Javascript 加載:
此 URL 是使用產品 ID 192481
和 siteName ark
構建的。 刪除 mmids 只返回應該足夠的產品(除非您需要 orderingCode ?):
https://ark.intel.com/libs/apps/intel/support/ark/recommendedCustomerPrice?ids=192481&siteName=ark
這個想法是您從原始 URL 中提取產品 ID:
https://ark.intel.com/content/www/us/en/ark/products/[PRODUCT_ID_HERE]/intel-xeon-platinum-8268-processor-35-75m-cache-2-90-ghz.html.
並改為調用此 API
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.