繁体   English   中英

VBA 抓取 JavaScript 生成的内容

[英]VBA Scrape Content Generated by JavaScript

我希望从使用 VBA 的 Excel 工作表中定义的 URL 获取推荐的客户定价信息。 这些值位于 Excel 中的 Cells(i,11) 中,它们都指向https://ark.intel.com上的特定页面。 值从第 5 行开始。

例如,如果我想找到英特尔至强 8268 的价格,我会导航到https://ark.intel.com/content/www/us/en/ark/products/192481/intel-xeon-platinum-8268 -processor-35-75m-cache-2-90-ghz.html 如果查看源代码,很明显此内容是用 JavaScript 生成的,因此我改为在 Firefox 网络浏览器上使用“检查元素”选项。

从这里,我可以向下导航并在标签中找到我要查找的内容。 见下图:

英特尔至强 8268 的推荐客户价格

我无法捕获该值并将其写入 excel 列,即列 E。以下是我所做的一次尝试:

Sub ProcessorPricing()
    Dim URL As String, lastRow As Long
    Dim XMLHTTP As Object, HTML As Object, objResult As Object, Price As Object

    lastRow = Range("A" & Rows.Count).End(xlUp).row

    Dim cookie As String
    Dim result_cookie As String

    For i = 5 To lastRow

        If Cells(i, 1) <> "" Then

            URL = Cells(i, 11)

            Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
            XMLHTTP.Open "GET", URL, False
            XMLHTTP.setRequestHeader "Content-Type", "text/xml"
            XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
            XMLHTTP.send

            Set HTML = CreateObject("htmlfile")
            HTML.body.innerHTML = XMLHTTP.responseText

            Set objResult = html.getElementsByID("bladeInside")
            Set Price = objResult.getElementsByTagName("span")(0)

            Cells(i, 5) = Price.Value
            DoEvents
        End If
    Next
End Sub

任何帮助将不胜感激。

PS - 我也试过在https://www.myonlinetraininghub.com/web-scraping-with-vba找到的代码也无济于事

更新:

能够在您的帮助下完成所有工作。 谢谢伯特兰·马特尔和斯塔夫罗斯·乔恩。

这是整个脚本:

Sub UpdateProcessorInfo()
'requirements:  JSON Parser installation needs to be added to project - https://github.com/VBA-tools/VBA-JSON - (Download latest release -> Import JsonConverter.bas -> File -> Import File)
'requirements:  Windows only, include Reference to "Microsoft Scripting Runtime" (Tools -> References -> Check Microsoft Scripting Runtime)
'requirements:  Add a refernce to Microsoft WinHTTP Services 5.1.  (Tools -> References -> Check Microsoft WinHTTP Services 5.1)

Dim Connection As WorkbookConnection
Dim url As String, lastRow As Long
Dim XMLHTTP As Object, html As Object, objResultDiv As Object, link As Object
Dim cookie As String
Dim result_cookie As String
Dim req As New WinHttpRequest
Dim ids As String
Dim responseJSON As Object

For Each Connection In ThisWorkbook.Connections
    Connection.Refresh
Next Connection

Worksheets("Processor_DB_Intel").Range("A2:A1000").Copy
Worksheets("Processor Comparisons").Range("A5").PasteSpecial Paste:=xlPasteValues

lastRow = Range("A" & Rows.Count).End(xlUp).row

Range("k5:k300").Clear

For i = 5 To lastRow

    If Cells(i, 1) <> "" Then

        url = "https://www.google.com/search?q=" & "site:ark.intel.com " & Cells(i, 1) & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)

        Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
        XMLHTTP.Open "GET", url, False
        XMLHTTP.setRequestHeader "Content-Type", "text/xml"
        XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
        XMLHTTP.send

        Set html = CreateObject("htmlfile")
        html.body.innerHTML = XMLHTTP.responseText
        Set objResultDiv = html.getElementById("rso")
        Set link = objResultDiv.getElementsByTagName("a")(0)

        Cells(i, 11) = link
        DoEvents
    End If

Next

lastRow = Range("A" & Rows.Count).End(xlUp).row

For i = 5 To lastRow

    ids = Cells(i, 13)
    url = "https://ark.intel.com/libs/apps/intel/support/ark/recommendedCustomerPrice?ids=" & ids & "&siteName=ark"

    If Cells(i, 1) <> "" Then

        With req
            .Open "GET", url, False
            .send
            Set responseJSON = JsonConverter.ParseJson(.responseText)
        End With

        On Error Resume Next

        'Debug.Print responseJSON(1)("displayPrice")
        Cells(i, 14) = responseJSON(1)("displayPrice")

    End If
Next

结束子

正如@Bertrand Martel 指出的那样,您可以使用一个非常方便的 API 来获取所需的信息。

为了进一步详细说明他的回答,并且由于您无法从 JSON 响应中提取价格,这是我的两分钱。

您需要将此JSON 解析器添加到您的项目中。 按照链接中的安装说明进行操作。

响应的结构如下所示:

在此处输入图片说明

所以这一切都归结为:

Option Explicit

Sub intel()
Dim req As New WinHttpRequest 'add a reference to Microsoft WinHTTP Services 5.1. MSXML2 works fine as well
Dim url As String, ids As String
Dim responseJSON As Object
ids = "192481"
url = "https://ark.intel.com/libs/apps/intel/support/ark/recommendedCustomerPrice?ids=" & ids & "&siteName=ark"
With req
    .Open "GET", url, False
    .send
    Set responseJSON = JsonConverter.ParseJson(.responseText)
End With
Debug.Print responseJSON(1)("displayPrice") 'For demonstration purposes the price is printed in the immediate window
End Sub

正如您所注意到的,数据并未嵌入到 html 中,而是使用外部 JSON API 通过 Javascript 加载:

https://ark.intel.com/libs/apps/intel/support/ark/recommendedCustomerPrice?ids=192481&mmids=999C0G&siteName=ark

此 URL 是使用产品 ID 192481和 siteName ark构建的。 删除 mmids 只返回应该足够的产品(除非您需要 orderingCode ?):

https://ark.intel.com/libs/apps/intel/support/ark/recommendedCustomerPrice?ids=192481&siteName=ark

这个想法是您从原始 URL 中提取产品 ID:

https://ark.intel.com/content/www/us/en/ark/products/[PRODUCT_ID_HERE]/intel-xeon-platinum-8268-processor-35-75m-cache-2-90-ghz.html.

并改为调用此 API

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM