[英]VBA Scrape Content Generated by JavaScript
I am looking to grab the Recommended Customer Pricing information from a URL that is defined in an Excel worksheet using VBA.我希望从使用 VBA 的 Excel 工作表中定义的 URL 获取推荐的客户定价信息。 Those values are in Excel in Cells(i,11), which all point to a specific page on https://ark.intel.com .
这些值位于 Excel 中的 Cells(i,11) 中,它们都指向https://ark.intel.com上的特定页面。 The values begin on row 5.
值从第 5 行开始。
For example, if I want to find the price for the Intel Xeon 8268 I would navigate to https://ark.intel.com/content/www/us/en/ark/products/192481/intel-xeon-platinum-8268-processor-35-75m-cache-2-90-ghz.html .例如,如果我想找到英特尔至强 8268 的价格,我会导航到https://ark.intel.com/content/www/us/en/ark/products/192481/intel-xeon-platinum-8268 -processor-35-75m-cache-2-90-ghz.html 。 If viewing source, it is obvious this content is generated with JavaScript, so I instead use "Inspect Element" option on Firefox web browser.
如果查看源代码,很明显此内容是用 JavaScript 生成的,因此我改为在 Firefox 网络浏览器上使用“检查元素”选项。
From here, I can navigate down and find what I am looking for in the tag.从这里,我可以向下导航并在标签中找到我要查找的内容。 See image below:
见下图:
I am unable to capture that value and write it to an excel column, which would be Column E. Below is one attempt I have made:我无法捕获该值并将其写入 excel 列,即列 E。以下是我所做的一次尝试:
Sub ProcessorPricing()
Dim URL As String, lastRow As Long
Dim XMLHTTP As Object, HTML As Object, objResult As Object, Price As Object
lastRow = Range("A" & Rows.Count).End(xlUp).row
Dim cookie As String
Dim result_cookie As String
For i = 5 To lastRow
If Cells(i, 1) <> "" Then
URL = Cells(i, 11)
Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
XMLHTTP.Open "GET", URL, False
XMLHTTP.setRequestHeader "Content-Type", "text/xml"
XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
XMLHTTP.send
Set HTML = CreateObject("htmlfile")
HTML.body.innerHTML = XMLHTTP.responseText
Set objResult = html.getElementsByID("bladeInside")
Set Price = objResult.getElementsByTagName("span")(0)
Cells(i, 5) = Price.Value
DoEvents
End If
Next
End Sub
Any help would be greatly appreciated.任何帮助将不胜感激。
PS - I have also tried the code found at https://www.myonlinetraininghub.com/web-scraping-with-vba to no avail either PS - 我也试过在https://www.myonlinetraininghub.com/web-scraping-with-vba找到的代码也无济于事
UPDATE:更新:
Was able to get everything working with your help.能够在您的帮助下完成所有工作。 Thank you, Bertrand Martel and Stavros Jon.
谢谢伯特兰·马特尔和斯塔夫罗斯·乔恩。
Here is the entire script:这是整个脚本:
Sub UpdateProcessorInfo()
'requirements: JSON Parser installation needs to be added to project - https://github.com/VBA-tools/VBA-JSON - (Download latest release -> Import JsonConverter.bas -> File -> Import File)
'requirements: Windows only, include Reference to "Microsoft Scripting Runtime" (Tools -> References -> Check Microsoft Scripting Runtime)
'requirements: Add a refernce to Microsoft WinHTTP Services 5.1. (Tools -> References -> Check Microsoft WinHTTP Services 5.1)
Dim Connection As WorkbookConnection
Dim url As String, lastRow As Long
Dim XMLHTTP As Object, html As Object, objResultDiv As Object, link As Object
Dim cookie As String
Dim result_cookie As String
Dim req As New WinHttpRequest
Dim ids As String
Dim responseJSON As Object
For Each Connection In ThisWorkbook.Connections
Connection.Refresh
Next Connection
Worksheets("Processor_DB_Intel").Range("A2:A1000").Copy
Worksheets("Processor Comparisons").Range("A5").PasteSpecial Paste:=xlPasteValues
lastRow = Range("A" & Rows.Count).End(xlUp).row
Range("k5:k300").Clear
For i = 5 To lastRow
If Cells(i, 1) <> "" Then
url = "https://www.google.com/search?q=" & "site:ark.intel.com " & Cells(i, 1) & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)
Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
XMLHTTP.Open "GET", url, False
XMLHTTP.setRequestHeader "Content-Type", "text/xml"
XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
XMLHTTP.send
Set html = CreateObject("htmlfile")
html.body.innerHTML = XMLHTTP.responseText
Set objResultDiv = html.getElementById("rso")
Set link = objResultDiv.getElementsByTagName("a")(0)
Cells(i, 11) = link
DoEvents
End If
Next
lastRow = Range("A" & Rows.Count).End(xlUp).row
For i = 5 To lastRow
ids = Cells(i, 13)
url = "https://ark.intel.com/libs/apps/intel/support/ark/recommendedCustomerPrice?ids=" & ids & "&siteName=ark"
If Cells(i, 1) <> "" Then
With req
.Open "GET", url, False
.send
Set responseJSON = JsonConverter.ParseJson(.responseText)
End With
On Error Resume Next
'Debug.Print responseJSON(1)("displayPrice")
Cells(i, 14) = responseJSON(1)("displayPrice")
End If
Next
End Sub结束子
AS @Bertrand Martel pointed out, there's a very convenient API you can use to grab the info you need.正如@Bertrand Martel 指出的那样,您可以使用一个非常方便的 API 来获取所需的信息。
To further elaborate on his answer and since you're having trouble extracting the price from the JSON response, here's my two cents.为了进一步详细说明他的回答,并且由于您无法从 JSON 响应中提取价格,这是我的两分钱。
You'll need to add this JSON parser to your project.您需要将此JSON 解析器添加到您的项目中。 Follow the installation instructions in the link.
按照链接中的安装说明进行操作。
The response's structure looks like this:响应的结构如下所示:
So it all comes down to this:所以这一切都归结为:
Option Explicit
Sub intel()
Dim req As New WinHttpRequest 'add a reference to Microsoft WinHTTP Services 5.1. MSXML2 works fine as well
Dim url As String, ids As String
Dim responseJSON As Object
ids = "192481"
url = "https://ark.intel.com/libs/apps/intel/support/ark/recommendedCustomerPrice?ids=" & ids & "&siteName=ark"
With req
.Open "GET", url, False
.send
Set responseJSON = JsonConverter.ParseJson(.responseText)
End With
Debug.Print responseJSON(1)("displayPrice") 'For demonstration purposes the price is printed in the immediate window
End Sub
As you have noticed the data is not embedded in the html but loaded via Javascript using an external JSON API:正如您所注意到的,数据并未嵌入到 html 中,而是使用外部 JSON API 通过 Javascript 加载:
https://ark.intel.com/libs/apps/intel/support/ark/recommendedCustomerPrice?ids=192481&mmids=999C0G&siteName=ark https://ark.intel.com/libs/apps/intel/support/ark/recommendedCustomerPrice?ids=192481&mmids=999C0G&siteName=ark
This URL is constructed using the product ID 192481
and the siteName ark
.此 URL 是使用产品 ID
192481
和 siteName ark
构建的。 Droping the mmids returns only the product which should be sufficient (unless you need the orderingCode ?):删除 mmids 只返回应该足够的产品(除非您需要 orderingCode ?):
https://ark.intel.com/libs/apps/intel/support/ark/recommendedCustomerPrice?ids=192481&siteName=ark https://ark.intel.com/libs/apps/intel/support/ark/recommendedCustomerPrice?ids=192481&siteName=ark
The idea is that you extract the product ID from your original URL :这个想法是您从原始 URL 中提取产品 ID:
https://ark.intel.com/content/www/us/en/ark/products/[PRODUCT_ID_HERE]/intel-xeon-platinum-8268-processor-35-75m-cache-2-90-ghz.html.
and call this API instead并改为调用此 API
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.