[英]Create a time out handler using for web scraping with Rselenium
I'm creating a scraper with Rselenium and phantomjs.我正在用 Rselenium 和 phantomjs 创建一个刮板。 Sometime my program querying a web-site take too long and never end.有时我的程序查询一个网站需要很长时间而且永远不会结束。 So I'm writing a time-out handler.所以我正在写一个超时处理程序。
library(RSelenium)
library(R.utils)
pJS <- phantom(pjs_cmd ="C:\\software\\phantomjs-2.0.0-windows\\bin\\phantomjs.exe" )
UA<-'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0'
eCap <- list(phantomjs.page.settings.userAgent=UA )
remDr <- remoteDriver(browserName = "phantomjs", extraCapabilities = eCap)
remDr$open(silent=T)
time_out<-0
tryCatch({withTimeout({
remDr$navigate("http://stackoverflow.com/questions/14399205/in-r-how-to-make-the-variables-inside-a-function-available-to-the-lower-level-f")
}, envir=globalenv(),timeout=1.08);
}, TimeoutException=function(ex) {
time_out<<-1
})
But I get the error : Undefined error in RCurl call.Error in queryRD(paste0(serverURL, "/session/", sessionInfo$id, "/url"), :
但我收到错误: Undefined error in RCurl call.Error in queryRD(paste0(serverURL, "/session/", sessionInfo$id, "/url"), :
Anyway if I try to look inside remDr
...无论如何,如果我尝试查看remDr
内部...
remDr$getTitle()[[1]]
[1] "In R, how to make the variables inside a function available to the lower level function inside this function?(with, attach, environment) - Stack Overflow"
So It worked!所以它起作用了! But why I get the error?但是为什么我收到错误?
Please update JAVA and check the Selenium Version whether the webdriver is running with the newest version.请更新 JAVA 并检查 Selenium 版本是否正在运行最新版本的 webdriver。 That solves the issue这解决了问题
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.