简体   繁体   English

使用vba Excel从网页下载数据

[英]Download data from webpage using vba Excel

in this post I try to reformulate my previous questions. 在这篇文章中,我尝试重新阐述我以前的问题。 I am trying to extract in excel a table in a webpage. 我试图在Excel中提取网页中的表格。 The problem is that the webpage is generated using javascript so when I look at the page source I do not find where the table is defined. 问题是该网页是使用javascript生成的,因此当我查看页面源代码时,找不到表的定义位置。 What I was trying to do is: 我试图做的是:

  1. Load the page 载入页面
  2. Select from a first menu on the left the option "Calcio" and then "ITA Serie A" (sorry but the webpage is in Italian) 从左侧的第一个菜单中选择“ Calcio”选项,然后选择“ ITA Serie A”(对不起,但该网页为意大利语)
  3. Extract the data in the table using "standard" functions such as getElementsByTagName or getElementByID 使用“标准”函数(如getElementsByTagName或getElementByID)提取表中的数据

Up to now I was able to load the page, run a script for step 2 (and the page is correctly updated). 到目前为止,我已经能够加载页面,为第2步运行脚本(页面已正确更新)。 The problem is that when I look at the page source I am not able to find the table I am interested in (the one with the Header "ESITO FINALE 1X2") so I do not know how to proceed to import the table in Excel. 问题是,当我查看页面源代码时,无法找到我感兴趣的表(标题为“ ESITO FINALE 1X2”的表),因此我不知道如何继续在Excel中导入表。

The starting page url is : " https://www.sisal.it/scommesse-matchpoint ". 起始页网址为:“ https://www.sisal.it/scommesse-matchpoint ”。

My goal is to import the data of the table into excel so if there is a completely different approach to solve the problem I am open to it. 我的目标是将表中的数据导入excel,因此,如果有完全不同的方法来解决问题,我可以接受。 Thanks! 谢谢!

Sub Control_Sisal()

  Dim htmlPage          As htmlDocument
  Dim strUrl            As String
  Const Title As String = "scommesse"

  strUrl = "https://www.sisal.it/scommesse-matchpoint"

  Call Navigate_Sisal(strUrl, htmlPage, Title)

End Sub

Sub Navigate_Sisal(strUrl As String, htmlPage As htmlDocument, Title As String)

  Dim IE            As Object
  Dim strScript     As String

  Set IE = CreateObject("InternetExplorer.application")  '
  IE.Visible = True
  IE.navigate strUrl

  Do 
     DoEvents 
  Loop Until IE.ReadyState = READYSTATE_COMPLETE 
  '
  ' Run the scripts to get the data
  '
  strScript = "getAlberaturaAntepostManager().clickManifestazione(1, 21)"
  IE.Document.parentWindow.execScript strScript, "jscript"

  Do 
    DoEvents 
  Loop Until IE.ReadyState = READYSTATE_COMPLETE

  Set htmlPage = IE.Document

End Sub

This answer is the one given by @TimWilliams in the above comments. 此答案是@TimWilliams在上述评论中给出的答案。 I wrote the answer here to close the question and help beginners like me. 我在这里写下答案以结束问题并帮助像我这样的初学者。

The document object represents the full page at that point in time, including any content generated by script (assuming that is completely rendered). document对象代表当时的整个页面,包括脚本生成的任何内容(假定已完全呈现)。 It isn't necessarily the same as what is displayed using "view source". 它不一定与使用“查看源”显示的内容相同。 Using the browser's Developer tools (F12) it is possible to view the final generated HTML and to find the elements needed (the script-generated table for the specific case of my question). 使用浏览器的开发人员工具(F12),可以查看最终生成的HTML并找到所需的元素(针对我的问题的特定情况,由脚本生成的表)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM