簡體   English   中英

VBA 抓取 JavaScript 生成的內容

[英]VBA Scrape Content Generated by JavaScript

我希望從使用 VBA 的 Excel 工作表中定義的 URL 獲取推薦的客戶定價信息。 這些值位於 Excel 中的 Cells(i,11) 中,它們都指向https://ark.intel.com上的特定頁面。 值從第 5 行開始。

例如,如果我想找到英特爾至強 8268 的價格,我會導航到https://ark.intel.com/content/www/us/en/ark/products/192481/intel-xeon-platinum-8268 -processor-35-75m-cache-2-90-ghz.html 如果查看源代碼,很明顯此內容是用 JavaScript 生成的,因此我改為在 Firefox 網絡瀏覽器上使用“檢查元素”選項。

從這里,我可以向下導航並在標簽中找到我要查找的內容。 見下圖:

英特爾至強 8268 的推薦客戶價格

我無法捕獲該值並將其寫入 excel 列,即列 E。以下是我所做的一次嘗試:

Sub ProcessorPricing()
    Dim URL As String, lastRow As Long
    Dim XMLHTTP As Object, HTML As Object, objResult As Object, Price As Object

    lastRow = Range("A" & Rows.Count).End(xlUp).row

    Dim cookie As String
    Dim result_cookie As String

    For i = 5 To lastRow

        If Cells(i, 1) <> "" Then

            URL = Cells(i, 11)

            Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
            XMLHTTP.Open "GET", URL, False
            XMLHTTP.setRequestHeader "Content-Type", "text/xml"
            XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
            XMLHTTP.send

            Set HTML = CreateObject("htmlfile")
            HTML.body.innerHTML = XMLHTTP.responseText

            Set objResult = html.getElementsByID("bladeInside")
            Set Price = objResult.getElementsByTagName("span")(0)

            Cells(i, 5) = Price.Value
            DoEvents
        End If
    Next
End Sub

任何幫助將不勝感激。

PS - 我也試過在https://www.myonlinetraininghub.com/web-scraping-with-vba找到的代碼也無濟於事

更新:

能夠在您的幫助下完成所有工作。 謝謝伯特蘭·馬特爾和斯塔夫羅斯·喬恩。

這是整個腳本:

Sub UpdateProcessorInfo()
'requirements:  JSON Parser installation needs to be added to project - https://github.com/VBA-tools/VBA-JSON - (Download latest release -> Import JsonConverter.bas -> File -> Import File)
'requirements:  Windows only, include Reference to "Microsoft Scripting Runtime" (Tools -> References -> Check Microsoft Scripting Runtime)
'requirements:  Add a refernce to Microsoft WinHTTP Services 5.1.  (Tools -> References -> Check Microsoft WinHTTP Services 5.1)

Dim Connection As WorkbookConnection
Dim url As String, lastRow As Long
Dim XMLHTTP As Object, html As Object, objResultDiv As Object, link As Object
Dim cookie As String
Dim result_cookie As String
Dim req As New WinHttpRequest
Dim ids As String
Dim responseJSON As Object

For Each Connection In ThisWorkbook.Connections
    Connection.Refresh
Next Connection

Worksheets("Processor_DB_Intel").Range("A2:A1000").Copy
Worksheets("Processor Comparisons").Range("A5").PasteSpecial Paste:=xlPasteValues

lastRow = Range("A" & Rows.Count).End(xlUp).row

Range("k5:k300").Clear

For i = 5 To lastRow

    If Cells(i, 1) <> "" Then

        url = "https://www.google.com/search?q=" & "site:ark.intel.com " & Cells(i, 1) & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)

        Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
        XMLHTTP.Open "GET", url, False
        XMLHTTP.setRequestHeader "Content-Type", "text/xml"
        XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
        XMLHTTP.send

        Set html = CreateObject("htmlfile")
        html.body.innerHTML = XMLHTTP.responseText
        Set objResultDiv = html.getElementById("rso")
        Set link = objResultDiv.getElementsByTagName("a")(0)

        Cells(i, 11) = link
        DoEvents
    End If

Next

lastRow = Range("A" & Rows.Count).End(xlUp).row

For i = 5 To lastRow

    ids = Cells(i, 13)
    url = "https://ark.intel.com/libs/apps/intel/support/ark/recommendedCustomerPrice?ids=" & ids & "&siteName=ark"

    If Cells(i, 1) <> "" Then

        With req
            .Open "GET", url, False
            .send
            Set responseJSON = JsonConverter.ParseJson(.responseText)
        End With

        On Error Resume Next

        'Debug.Print responseJSON(1)("displayPrice")
        Cells(i, 14) = responseJSON(1)("displayPrice")

    End If
Next

結束子

正如@Bertrand Martel 指出的那樣,您可以使用一個非常方便的 API 來獲取所需的信息。

為了進一步詳細說明他的回答,並且由於您無法從 JSON 響應中提取價格,這是我的兩分錢。

您需要將此JSON 解析器添加到您的項目中。 按照鏈接中的安裝說明進行操作。

響應的結構如下所示:

在此處輸入圖片說明

所以這一切都歸結為:

Option Explicit

Sub intel()
Dim req As New WinHttpRequest 'add a reference to Microsoft WinHTTP Services 5.1. MSXML2 works fine as well
Dim url As String, ids As String
Dim responseJSON As Object
ids = "192481"
url = "https://ark.intel.com/libs/apps/intel/support/ark/recommendedCustomerPrice?ids=" & ids & "&siteName=ark"
With req
    .Open "GET", url, False
    .send
    Set responseJSON = JsonConverter.ParseJson(.responseText)
End With
Debug.Print responseJSON(1)("displayPrice") 'For demonstration purposes the price is printed in the immediate window
End Sub

正如您所注意到的,數據並未嵌入到 html 中,而是使用外部 JSON API 通過 Javascript 加載:

https://ark.intel.com/libs/apps/intel/support/ark/recommendedCustomerPrice?ids=192481&mmids=999C0G&siteName=ark

此 URL 是使用產品 ID 192481和 siteName ark構建的。 刪除 mmids 只返回應該足夠的產品(除非您需要 orderingCode ?):

https://ark.intel.com/libs/apps/intel/support/ark/recommendedCustomerPrice?ids=192481&siteName=ark

這個想法是您從原始 URL 中提取產品 ID:

https://ark.intel.com/content/www/us/en/ark/products/[PRODUCT_ID_HERE]/intel-xeon-platinum-8268-processor-35-75m-cache-2-90-ghz.html.

並改為調用此 API

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM