简体   繁体   中英

Problems with RSelenium

Good evening everyone,

I have been trying to run an old script of mine using RSelenium. Due to some changes, it is not working anymore. The original code was

require(RSelenium)
require(rvest)
RSelenium::checkForServer()
RSelenium::startServer()
remDr <- remoteDriver()
remDr$open()

remDr$navigate(linkPlayersPage)
doc <- remDr$getPageSource()
doc <- read_html(doc[[1]])
path <- "//table[@class='playersquickfindtable']/tbody/tr/td/form/table/tbody/tr/td/div/img"
quickFind <- doc %>% html_nodes(xpath=path) %>% xml_attr("alt")
remDr$close()

Following advice I found here, I changed it for the following

require(RSelenium)
require(rvest)

driver <- rsDriver()
remDr <- driver[["client"]]

remDr$navigate(linkPlayersPage)
doc <- remDr$getPageSource()
doc <- read_html(doc[[1]])
path <- "//table[@class='playersquickfindtable']/tbody/tr/td/form/table/tbody/tr/td/div/img"
quickFind <- doc %>% html_nodes(xpath=path) %>% xml_attr("alt")
remDr$close()

The thing is that it does not really work. Or well it sometimes works, but first, it is very slow and second (much more important problem) the script very often stops (I have a loop with a bit more than 11000 addresses). Sometimes after waiting a while and rerunning from where it stops, it works, sometimes not at all, but I know it should work. I get the following errors (sorry, it is a mix of English and French, but the few French words should not be hard to translate for an English speaking person)

Error in if (!is.null(YD) && grepl("Draft", YD)) { : valeur manquante là où TRUE / FALSE est requis ( missing value where TRUE / FALSE is required )

checking geckodriver versions: BEGIN: PREDOWNLOAD Error in open.connection(con, "rb") : HTTP error 403.

Sometimes I get some other errors, but the 10" is the most common. I really have no idea why and how to solve the issue.

Today I had a new error

checking geckodriver versions: BEGIN: PREDOWNLOAD BEGIN: DOWNLOAD BEGIN: POSTDOWNLOAD checking phantomjs versions: BEGIN: PREDOWNLOAD BEGIN: DOWNLOAD BEGIN: POSTDOWNLOAD Error in subprocess::spawn_process(tfile, ...) : could not create a pipe: system error message could not be fetched

I have the feeling that all of these are related to rsDriver().

In the answers I read, it is mentioned that the best thing to do is not to use rsDriver() but to use Docker . Prior to yesterday, I had no idea what this is, and I could not find anything which would clearly explain what it does or how to use it with R and RSelenium... For example here RSelenium through docker . I have tried the links, but the pages would not show...

Could anyone help to fix this? Solving my rsDriver() problem to make it a 100% working solution is fine for me. Thank you very much. For information, I am on OpenSuse (no idea if this would make things different than on a Windows or Mac system).

The list I am running through contains the players' webpages on the NFL website. An example is http://www.nfl.com/players/profile?id=00-0019290

In the end, I could solve the issue using Docker. The following https://cran.r-project.org/web/packages/RSelenium/vignettes/RSelenium-docker.html gives good information on how to proceed. The script works fine and does not stop (for now, I tested for about 100 pages, and launched the loop this morning). It is still running and will take probably more than a day to finish. I could not solve the issue using rsDriver, it would always stop with the error Error in subprocess::spawn_process(tfile, ...) : could not create a pipe: system error message could not be fetched I have no idea why. Even though I solved my problem and can again run my script, I would appreciate if anyone would know the reason why rsDriver() was not doing the job.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM