简体   繁体   English

如何以编程方式获取已爬网页的快照(在Ruby中)?

[英]How to Programmatically take Snapshot of Crawled Webpages (in Ruby)?

What is the best solution to programmatically take a snapshot of a webpage? 以编程方式拍摄网页快照的最佳解决方案是什么?

The situation is this: I would like to crawl a bunch of webpages and take thumbnail snapshots of them periodically, say once every few months, without having to manually go to each one. 情况是这样的:我想抓住一堆网页并定期拍摄它们的缩略图,比如每隔几个月说一次,而不必手动去每一个。 I would also like to be able to take jpg/png snapshots of websites that might be completely Flash/Flex, so I'd have to wait until it loaded to take the snapshot somehow. 我还希望能够获取可能完全是Flash / Flex的网站的jpg / png快照,所以我必须等到加载它以某种方式拍摄快照。

It would be nice if there was no limit to the number of thumbnails I could generate (within reason, say 1000 per day). 如果对我可以生成的缩略图数量没有限制(在合理范围内,比方说每天1000),那就太好了。

Any ideas how to do this in Ruby? 任何想法如何在Ruby中做到这一点? Seems pretty tough. 看起来非常艰难。

Browsers to do this in: Safari or Firefox, preferably Safari. 浏览器要执行此操作:Safari或Firefox,最好是Safari。

Thanks so much. 非常感谢。

This really depends on your operating system. 这实际上取决于您的操作系统。 What you need is a way to hook into a web browser and save that to an image. 您需要的是一种挂钩到Web浏览器并将其保存到图像的方法。

If you are on a Mac - I would imagine your best bet would be to use MacRuby (or RubyCocoa - although I believe this is going to be deprecated in the near future) and then to use the WebKit framework to load the page and render it as an image. 如果你是一个Mac上-我会想象你最好的选择是使用MacRuby的 (或RubyCocoa -虽然我相信这会在不久的将来被弃用),然后使用WebKit框架加载的页面并使其作为一个形象。

This is definitely possible, for inspiration you may wish to look at the Paparazzi! 这绝对是可能的,你可能希望看看狗仔队的灵感 and webkit2png projects. webkit2png项目。

Another option, which isn't dependent on the OS, might be to use the BrowserShots API . 另一个不依赖于操作系统的选项可能是使用BrowserShots API

There is no built in library in Ruby for rendering a web page. Ruby中没有用于呈现网页的内置库。

as viewed by.... ie? 正如...所见? firefox? 火狐? opera? 歌剧? one of the myriad webkit engines? 无数的webkit引擎之一?

if only it were possible to automate http://browsershots.org :) 如果只有它可以自动化http://browsershots.org :)

使用selenium-rc,它带有快照功能。

使用jruby,您可以使用SWT的浏览器库。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM