简体   繁体   English

使用 Rselenium 创建用于网页抓取的超时处理程序

[英]Create a time out handler using for web scraping with Rselenium

I'm creating a scraper with Rselenium and phantomjs.我正在用 Rselenium 和 phantomjs 创建一个刮板。 Sometime my program querying a web-site take too long and never end.有时我的程序查询一个网站需要很长时间而且永远不会结束。 So I'm writing a time-out handler.所以我正在写一个超时处理程序。

library(RSelenium)
library(R.utils)
pJS <- phantom(pjs_cmd ="C:\\software\\phantomjs-2.0.0-windows\\bin\\phantomjs.exe"     )
UA<-'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0'
eCap <- list(phantomjs.page.settings.userAgent=UA )
remDr <- remoteDriver(browserName = "phantomjs", extraCapabilities = eCap)
remDr$open(silent=T)

time_out<-0
tryCatch({withTimeout({
        remDr$navigate("http://stackoverflow.com/questions/14399205/in-r-how-to-make-the-variables-inside-a-function-available-to-the-lower-level-f")
                                      }, envir=globalenv(),timeout=1.08);
                            }, TimeoutException=function(ex) {
            time_out<<-1
})

But I get the error : Undefined error in RCurl call.Error in queryRD(paste0(serverURL, "/session/", sessionInfo$id, "/url"), :但我收到错误: Undefined error in RCurl call.Error in queryRD(paste0(serverURL, "/session/", sessionInfo$id, "/url"), :

Anyway if I try to look inside remDr ...无论如何,如果我尝试查看remDr内部...

remDr$getTitle()[[1]]
[1] "In R, how to make the variables inside a function available to the lower level function inside this function?(with, attach, environment) - Stack Overflow"

So It worked!所以它起作用了! But why I get the error?但是为什么我收到错误?

Please update JAVA and check the Selenium Version whether the webdriver is running with the newest version.请更新 JAVA 并检查 Selenium 版本是否正在运行最新版本的 webdriver。 That solves the issue这解决了问题

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM