简体   繁体   English

VBA 抓取 JavaScript 生成的内容

[英]VBA Scrape Content Generated by JavaScript

I am looking to grab the Recommended Customer Pricing information from a URL that is defined in an Excel worksheet using VBA.我希望从使用 VBA 的 Excel 工作表中定义的 URL 获取推荐的客户定价信息。 Those values are in Excel in Cells(i,11), which all point to a specific page on https://ark.intel.com .这些值位于 Excel 中的 Cells(i,11) 中,它们都指向https://ark.intel.com上的特定页面。 The values begin on row 5.值从第 5 行开始。

For example, if I want to find the price for the Intel Xeon 8268 I would navigate to https://ark.intel.com/content/www/us/en/ark/products/192481/intel-xeon-platinum-8268-processor-35-75m-cache-2-90-ghz.html .例如,如果我想找到英特尔至强 8268 的价格,我会导航到https://ark.intel.com/content/www/us/en/ark/products/192481/intel-xeon-platinum-8268 -processor-35-75m-cache-2-90-ghz.html If viewing source, it is obvious this content is generated with JavaScript, so I instead use "Inspect Element" option on Firefox web browser.如果查看源代码,很明显此内容是用 JavaScript 生成的,因此我改为在 Firefox 网络浏览器上使用“检查元素”选项。

From here, I can navigate down and find what I am looking for in the tag.从这里,我可以向下导航并在标签中找到我要查找的内容。 See image below:见下图:

英特尔至强 8268 的推荐客户价格

I am unable to capture that value and write it to an excel column, which would be Column E. Below is one attempt I have made:我无法捕获该值并将其写入 excel 列,即列 E。以下是我所做的一次尝试:

Sub ProcessorPricing()
    Dim URL As String, lastRow As Long
    Dim XMLHTTP As Object, HTML As Object, objResult As Object, Price As Object

    lastRow = Range("A" & Rows.Count).End(xlUp).row

    Dim cookie As String
    Dim result_cookie As String

    For i = 5 To lastRow

        If Cells(i, 1) <> "" Then

            URL = Cells(i, 11)

            Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
            XMLHTTP.Open "GET", URL, False
            XMLHTTP.setRequestHeader "Content-Type", "text/xml"
            XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
            XMLHTTP.send

            Set HTML = CreateObject("htmlfile")
            HTML.body.innerHTML = XMLHTTP.responseText

            Set objResult = html.getElementsByID("bladeInside")
            Set Price = objResult.getElementsByTagName("span")(0)

            Cells(i, 5) = Price.Value
            DoEvents
        End If
    Next
End Sub

Any help would be greatly appreciated.任何帮助将不胜感激。

PS - I have also tried the code found at https://www.myonlinetraininghub.com/web-scraping-with-vba to no avail either PS - 我也试过在https://www.myonlinetraininghub.com/web-scraping-with-vba找到的代码也无济于事

UPDATE:更新:

Was able to get everything working with your help.能够在您的帮助下完成所有工作。 Thank you, Bertrand Martel and Stavros Jon.谢谢伯特兰·马特尔和斯塔夫罗斯·乔恩。

Here is the entire script:这是整个脚本:

Sub UpdateProcessorInfo()
'requirements:  JSON Parser installation needs to be added to project - https://github.com/VBA-tools/VBA-JSON - (Download latest release -> Import JsonConverter.bas -> File -> Import File)
'requirements:  Windows only, include Reference to "Microsoft Scripting Runtime" (Tools -> References -> Check Microsoft Scripting Runtime)
'requirements:  Add a refernce to Microsoft WinHTTP Services 5.1.  (Tools -> References -> Check Microsoft WinHTTP Services 5.1)

Dim Connection As WorkbookConnection
Dim url As String, lastRow As Long
Dim XMLHTTP As Object, html As Object, objResultDiv As Object, link As Object
Dim cookie As String
Dim result_cookie As String
Dim req As New WinHttpRequest
Dim ids As String
Dim responseJSON As Object

For Each Connection In ThisWorkbook.Connections
    Connection.Refresh
Next Connection

Worksheets("Processor_DB_Intel").Range("A2:A1000").Copy
Worksheets("Processor Comparisons").Range("A5").PasteSpecial Paste:=xlPasteValues

lastRow = Range("A" & Rows.Count).End(xlUp).row

Range("k5:k300").Clear

For i = 5 To lastRow

    If Cells(i, 1) <> "" Then

        url = "https://www.google.com/search?q=" & "site:ark.intel.com " & Cells(i, 1) & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)

        Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
        XMLHTTP.Open "GET", url, False
        XMLHTTP.setRequestHeader "Content-Type", "text/xml"
        XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
        XMLHTTP.send

        Set html = CreateObject("htmlfile")
        html.body.innerHTML = XMLHTTP.responseText
        Set objResultDiv = html.getElementById("rso")
        Set link = objResultDiv.getElementsByTagName("a")(0)

        Cells(i, 11) = link
        DoEvents
    End If

Next

lastRow = Range("A" & Rows.Count).End(xlUp).row

For i = 5 To lastRow

    ids = Cells(i, 13)
    url = "https://ark.intel.com/libs/apps/intel/support/ark/recommendedCustomerPrice?ids=" & ids & "&siteName=ark"

    If Cells(i, 1) <> "" Then

        With req
            .Open "GET", url, False
            .send
            Set responseJSON = JsonConverter.ParseJson(.responseText)
        End With

        On Error Resume Next

        'Debug.Print responseJSON(1)("displayPrice")
        Cells(i, 14) = responseJSON(1)("displayPrice")

    End If
Next

End Sub结束子

AS @Bertrand Martel pointed out, there's a very convenient API you can use to grab the info you need.正如@Bertrand Martel 指出的那样,您可以使用一个非常方便的 API 来获取所需的信息。

To further elaborate on his answer and since you're having trouble extracting the price from the JSON response, here's my two cents.为了进一步详细说明他的回答,并且由于您无法从 JSON 响应中提取价格,这是我的两分钱。

You'll need to add this JSON parser to your project.您需要将此JSON 解析器添加到您的项目中。 Follow the installation instructions in the link.按照链接中的安装说明进行操作。

The response's structure looks like this:响应的结构如下所示:

在此处输入图片说明

So it all comes down to this:所以这一切都归结为:

Option Explicit

Sub intel()
Dim req As New WinHttpRequest 'add a reference to Microsoft WinHTTP Services 5.1. MSXML2 works fine as well
Dim url As String, ids As String
Dim responseJSON As Object
ids = "192481"
url = "https://ark.intel.com/libs/apps/intel/support/ark/recommendedCustomerPrice?ids=" & ids & "&siteName=ark"
With req
    .Open "GET", url, False
    .send
    Set responseJSON = JsonConverter.ParseJson(.responseText)
End With
Debug.Print responseJSON(1)("displayPrice") 'For demonstration purposes the price is printed in the immediate window
End Sub

As you have noticed the data is not embedded in the html but loaded via Javascript using an external JSON API:正如您所注意到的,数据并未嵌入到 html 中,而是使用外部 JSON API 通过 Javascript 加载:

https://ark.intel.com/libs/apps/intel/support/ark/recommendedCustomerPrice?ids=192481&mmids=999C0G&siteName=ark https://ark.intel.com/libs/apps/intel/support/ark/recommendedCustomerPrice?ids=192481&mmids=999C0G&siteName=ark

This URL is constructed using the product ID 192481 and the siteName ark .此 URL 是使用产品 ID 192481和 siteName ark构建的。 Droping the mmids returns only the product which should be sufficient (unless you need the orderingCode ?):删除 mmids 只返回应该足够的产品(除非您需要 orderingCode ?):

https://ark.intel.com/libs/apps/intel/support/ark/recommendedCustomerPrice?ids=192481&siteName=ark https://ark.intel.com/libs/apps/intel/support/ark/recommendedCustomerPrice?ids=192481&siteName=ark

The idea is that you extract the product ID from your original URL :这个想法是您从原始 URL 中提取产品 ID:

https://ark.intel.com/content/www/us/en/ark/products/[PRODUCT_ID_HERE]/intel-xeon-platinum-8268-processor-35-75m-cache-2-90-ghz.html.

and call this API instead并改为调用此 API

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM