[英]VBA Scrape Content Generated by JavaScript
我希望从使用 VBA 的 Excel 工作表中定义的 URL 获取推荐的客户定价信息。 这些值位于 Excel 中的 Cells(i,11) 中,它们都指向https://ark.intel.com上的特定页面。 值从第 5 行开始。
例如,如果我想找到英特尔至强 8268 的价格,我会导航到https://ark.intel.com/content/www/us/en/ark/products/192481/intel-xeon-platinum-8268 -processor-35-75m-cache-2-90-ghz.html 。 如果查看源代码,很明显此内容是用 JavaScript 生成的,因此我改为在 Firefox 网络浏览器上使用“检查元素”选项。
从这里,我可以向下导航并在标签中找到我要查找的内容。 见下图:
我无法捕获该值并将其写入 excel 列,即列 E。以下是我所做的一次尝试:
Sub ProcessorPricing()
Dim URL As String, lastRow As Long
Dim XMLHTTP As Object, HTML As Object, objResult As Object, Price As Object
lastRow = Range("A" & Rows.Count).End(xlUp).row
Dim cookie As String
Dim result_cookie As String
For i = 5 To lastRow
If Cells(i, 1) <> "" Then
URL = Cells(i, 11)
Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
XMLHTTP.Open "GET", URL, False
XMLHTTP.setRequestHeader "Content-Type", "text/xml"
XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
XMLHTTP.send
Set HTML = CreateObject("htmlfile")
HTML.body.innerHTML = XMLHTTP.responseText
Set objResult = html.getElementsByID("bladeInside")
Set Price = objResult.getElementsByTagName("span")(0)
Cells(i, 5) = Price.Value
DoEvents
End If
Next
End Sub
任何帮助将不胜感激。
PS - 我也试过在https://www.myonlinetraininghub.com/web-scraping-with-vba找到的代码也无济于事
更新:
能够在您的帮助下完成所有工作。 谢谢伯特兰·马特尔和斯塔夫罗斯·乔恩。
这是整个脚本:
Sub UpdateProcessorInfo()
'requirements: JSON Parser installation needs to be added to project - https://github.com/VBA-tools/VBA-JSON - (Download latest release -> Import JsonConverter.bas -> File -> Import File)
'requirements: Windows only, include Reference to "Microsoft Scripting Runtime" (Tools -> References -> Check Microsoft Scripting Runtime)
'requirements: Add a refernce to Microsoft WinHTTP Services 5.1. (Tools -> References -> Check Microsoft WinHTTP Services 5.1)
Dim Connection As WorkbookConnection
Dim url As String, lastRow As Long
Dim XMLHTTP As Object, html As Object, objResultDiv As Object, link As Object
Dim cookie As String
Dim result_cookie As String
Dim req As New WinHttpRequest
Dim ids As String
Dim responseJSON As Object
For Each Connection In ThisWorkbook.Connections
Connection.Refresh
Next Connection
Worksheets("Processor_DB_Intel").Range("A2:A1000").Copy
Worksheets("Processor Comparisons").Range("A5").PasteSpecial Paste:=xlPasteValues
lastRow = Range("A" & Rows.Count).End(xlUp).row
Range("k5:k300").Clear
For i = 5 To lastRow
If Cells(i, 1) <> "" Then
url = "https://www.google.com/search?q=" & "site:ark.intel.com " & Cells(i, 1) & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)
Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
XMLHTTP.Open "GET", url, False
XMLHTTP.setRequestHeader "Content-Type", "text/xml"
XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
XMLHTTP.send
Set html = CreateObject("htmlfile")
html.body.innerHTML = XMLHTTP.responseText
Set objResultDiv = html.getElementById("rso")
Set link = objResultDiv.getElementsByTagName("a")(0)
Cells(i, 11) = link
DoEvents
End If
Next
lastRow = Range("A" & Rows.Count).End(xlUp).row
For i = 5 To lastRow
ids = Cells(i, 13)
url = "https://ark.intel.com/libs/apps/intel/support/ark/recommendedCustomerPrice?ids=" & ids & "&siteName=ark"
If Cells(i, 1) <> "" Then
With req
.Open "GET", url, False
.send
Set responseJSON = JsonConverter.ParseJson(.responseText)
End With
On Error Resume Next
'Debug.Print responseJSON(1)("displayPrice")
Cells(i, 14) = responseJSON(1)("displayPrice")
End If
Next
结束子
正如@Bertrand Martel 指出的那样,您可以使用一个非常方便的 API 来获取所需的信息。
为了进一步详细说明他的回答,并且由于您无法从 JSON 响应中提取价格,这是我的两分钱。
您需要将此JSON 解析器添加到您的项目中。 按照链接中的安装说明进行操作。
响应的结构如下所示:
所以这一切都归结为:
Option Explicit
Sub intel()
Dim req As New WinHttpRequest 'add a reference to Microsoft WinHTTP Services 5.1. MSXML2 works fine as well
Dim url As String, ids As String
Dim responseJSON As Object
ids = "192481"
url = "https://ark.intel.com/libs/apps/intel/support/ark/recommendedCustomerPrice?ids=" & ids & "&siteName=ark"
With req
.Open "GET", url, False
.send
Set responseJSON = JsonConverter.ParseJson(.responseText)
End With
Debug.Print responseJSON(1)("displayPrice") 'For demonstration purposes the price is printed in the immediate window
End Sub
正如您所注意到的,数据并未嵌入到 html 中,而是使用外部 JSON API 通过 Javascript 加载:
此 URL 是使用产品 ID 192481
和 siteName ark
构建的。 删除 mmids 只返回应该足够的产品(除非您需要 orderingCode ?):
https://ark.intel.com/libs/apps/intel/support/ark/recommendedCustomerPrice?ids=192481&siteName=ark
这个想法是您从原始 URL 中提取产品 ID:
https://ark.intel.com/content/www/us/en/ark/products/[PRODUCT_ID_HERE]/intel-xeon-platinum-8268-processor-35-75m-cache-2-90-ghz.html.
并改为调用此 API
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.