簡體   English   中英

VBA - 處理 XMLHTTP GET 請求中的 JavaScript 內容

[英]VBA - dealing with JavaScript content in XMLHTTP GET request

我想從網頁中提取內容。 但是,當我收到響應文本時,它包含 JavaScript,無法像瀏覽器打開的頁面那樣進行處理。

這種方法可以用於獲取 HTML 內容還是只有瀏覽器模擬可以提供幫助? 或者也許有一些不同的方法來接收這些內容?

Dim oXMLHTTP As New MSXML2.XMLHTTP
Dim htmlObj As New HTMLDocument

With oXMLHTTP
    .Open "GET", "http://www.manta.com/ic/mtqyfk0/ca/riverbend-holdings-inc", False
    .send

    If .ReadyState = 4 And .Status = 200 Then            
        htmlObj.body.innerHTML = .responseText
        'do things
    End If

End With

響應文本:

<!DOCTYPE html>
<head>
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
<meta http-equiv="cache-control" content="max-age=0" />
<meta http-equiv="cache-control" content="no-cache" />
<meta http-equiv="expires" content="0" />
<meta http-equiv="expires" content="Tue, 01 Jan 1980 1:00:00 GMT" />
<meta http-equiv="pragma" content="no-cache" />
<meta http-equiv="refresh" content="10; url=/distil_r_blocked.html?Ref=/ic/mtq599v/ca/45th-street-limited-partnership&amp;distil_RID=2115B138-A1BF-11E6-A957-C0595F6B962F&amp;distil_TID=20161103121454" />
<script type="text/javascript">
    (function(window){
        try {
            if (typeof sessionStorage !== 'undefined'){
                sessionStorage.setItem('distil_referrer', document.referrer);
            }
        } catch (e){}
    })(window);
</script>
<script type="text/javascript" src="/ser-yrbwqfedrrwwvctvyavy.js" defer></script><style type="text/css">#d__fFH{position:absolute;top:-5000px;left:-5000px}#d__fF{font-family:serif;font-size:200px;visibility:hidden}#verxvaxcuczwcwecuxsx{display:none!important}</style></head>
<body>
<div id="distil_ident_block">&nbsp;</div>
</body>
</html>

不 - 因為 Javascript 實際上是<script>標簽內 HTML 的一部分。 您必須對響應進行后處理以自己刪除標簽。

收到頁面的 HTML 后,您可以使用函數從 DOM 中刪除<script>節點:

Function RemoveScriptTags(objHTML As HTMLDocument) As String

    Dim objElement As HTMLGenericElement

    For Each objElement In objHTML.all
        If VBA.LCase$(objElement.nodeName) = "script" Then
            objElement.removeNode
        End If
    Next objElement

    RemoveScriptTags = objHTML.DocumentElement.outerHTML

End Function

這可以包含在您的示例代碼中,如下所示:

Option Explicit

Sub Test()

    Dim objXMLHTTP As New MSXML2.XMLHTTP
    Dim objHTML As Object
    Dim strUrl As String
    Dim strHtmlNoScriptTags As String

    strUrl = "http://www.manta.com/ic/mtqyfk0/ca/riverbend-holdings-inc"

    With objXMLHTTP
        .Open "GET", strUrl, False
        .send

        If .ReadyState = 4 And .Status = 200 Then
            Set objHTML = CreateObject("htmlfile")
            objHTML.Open
            objHTML.write objXMLHTTP.responseText
            objHTML.Close

            'do things
            strHtmlNoScriptTags = RemoveScriptTags(objHTML)
            Debug.Print strHtmlNoScriptTags

            'update html document with script-less document
            Set objHTML = CreateObject("htmlfile")
            objHTML.Open
            objHTML.write strHtmlNoScriptTags
            objHTML.Close

            'you can now operate on DOM of objHTML

        End If

    End With

End Sub

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM