简体   繁体   English

从源代码中没有的网页中提取数据

[英]Extracting Data From Webpage That Isn't In the Source Code

I'd like to write a macro in Excel that pulls data from the webpage below: 我想在Excel中编写一个宏,以从下面的网页中提取数据:

http://www.richmond.com/data-center/salaries-virginia-state-employees-2013/?appSession=673718284851033&RecordID=101177&PageID=3&PrevPageID=2&cpipage=1&CPIsortType=&CPIorderBy=&cbCurrentRecordPosition=1 http://www.richmond.com/data-center/salaries-virginia-state-employees-2013/?appSession=673718284851033&RecordID=101177&PageID=3&PrevPageID=2&cpipage=1&CPIsortType=&CPIorderBy=&cbCurrentRecordPosition=1

The problem I'm having is that the employee information data isn't in the page source so when I use the code below (where NextPage is set to the above URL) the responseText doesn't include the data I'm looking for. 我遇到的问题是员工信息数据不在页面源中,所以当我使用下面的代码(其中NextPage设置为上面的URL)时, responseText不包含我要查找的数据。

With CreateObject("msxml2.xmlhttp")
    .Open "GET", NextPage, False
    .Send
    htm.body.innerHtml = .responseText
End With

I could very well be wrong but I believe the data is contained within the page's DOM. 我很可能是错的,但我相信数据包含在页面的DOM中。 Can someone help me understand how I can download the contents of this page as displayed (ie after the javascript modifications have been applied) using VBScript? 有人可以帮助我了解如何使用VBScript下载显示的页面内容(即在应用javascript修改之后)吗?

Using the InternetExplorer.Application COM object should give you access to the actual DOM tree: 使用InternetExplorer.Application COM对象应该使您可以访问实际的DOM树:

url = "http://www.richmond.com/..."

Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True

ie.Navigate url

Do
  WScript.Sleep 100
Until ie.ReadyState = 4

Set elem = ie.Document.getElementById("...")

If that doesn't work, you may have to resort to something like PhantomJS . 如果那不起作用,则可能必须诉诸PhantomJS之类的东西。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM