简体   繁体   English

具有动态网站的Ruby Open-URI

[英]Ruby Open-URI with Dynamic Website

I'm trying to use open-uri to get the html page for a website. 我正在尝试使用open-uri来获取网站的html页面。 However, the problem is that the website needs a couple of seconds to load for it to properly have the correct code. 但是,问题是网站需要几秒钟才能加载它以正确拥有正确的代码。 What I have right now is: 我现在拥有的是:

require 'open-uri'

html = open('http://hiddencode.me/dribbbucket/embed.html?key=MY_API_KEY&bucket=56024-Glassboard&delay=5000')
response = html.read
puts response

If I run this right now, I get: 如果我现在运行,我得到:

<div id="slam-dunk">
    <div id="loading">Loading..</div>
</div>

However, the site needs to properly load first before opening to get the correct response. 但是,在打开之前,站点需要先正确加载以获得正确的响应。 Any ideas how to do this in ruby? 任何想法如何在ruby中做到这一点? I can also use a solution in another language, if ruby is not your expertise! 如果红宝石不是你的专长,我也可以使用另一种语言的解决方案!

As an example, I recently used watir-webdriver to accomplish a similar task. 作为一个例子,我最近使用watir-webdriver来完成类似的任务。 You'll be able to query the DOM after javascript execution and pull anything you want out. 您将能够在执行javascript后查询DOM并提取您想要的任何内容。 If you'd like it to be headless, in my case I used the headless gem. 如果你想让它无头,在我的情况下,我使用无头宝石。

If you'd like to stick with 'open-uri' then you'll have to use something like httpfox to watch which ajax requests the javascript makes. 如果你想坚持使用'open-uri'那么你将不得不使用像httpfox这样的东西来观看哪些ajax请求javascript。 You can do this with many different tools as well. 您也可以使用许多不同的工具来完成此操作。 But you'd start httpfox, in this case, before you visit the url. 但是在这种情况下,在你访问网址之前你会启动httpfox。 Wait until you see the information you're trying to scrape appear, then stop httpfox and go through each request checking each response for things relevant to what you're scraping. 等到你看到你想要抓取的信息出现,然后停止httpfox并检查每个请求,检查每个响应是否与你正在抓取的内容有关。 Once you identify the proper request, you may be able to use that with open-uri. 一旦确定了正确的请求,您就可以使用open-uri。 While being the simplest, this solution is not guaranteed as web applications vary widely in how they interact with servers and manipulate the dom. 虽然最简单,但这种解决方案并不能保证,因为Web应用程序与服务器交互和操作dom的方式差别很大。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM