简体   繁体   中英

TryCatch in R - loop webscraping

I'm using RSelenium to trying to webscrape several URLs to extract information about price over the years. My problem is that some of the URLs may not exist (I generated the URLs with the years that I need the information) and I need to skip it and go to the next URL without stop.

I think that tryCatch() would help but I don't know exactly how to use it:

base = "https://www.cochrane.org"
codes_test = list("03040300")
month_ = c("01", "02", "03", "04","05","06","07","08", "09", "10", "11","12")
year_ = c(2008:2019)
html <- apply(expand.grid(base, codes_test, month_, year_), 
              MARGIN = 1, 
              FUN = function(x)paste(x, collapse = "/"))


remDr$navigate("https://www.cochrane.org/0304070017/10/2017")
webElement <- remDr$findElement(value = '//*[@id="acessoAutomatico"]/a')
webElement$clickElement() 

l <-length(html) 

for(j in seq(html)){ 
  sigtap <- foreach(i=1:l) %dopar% {

    tryCatch(stop("no"), error = function(e) cat("Error: ",e$message, "\n")) 
    remDr$navigate(html[i])

    names <- remDr$findElements(value = ' //*[@id="content"]/fieldset[4]/fieldset/table/tbody/tr[2]/td[1]/label | //*[@id="content"]/fieldset[4]/fieldset/table/tbody/tr[1]/td[3]/label | //*[@id="content"]/fieldset[4]/fieldset/table/tbody/tr[2]/td[3]/label | //*[@id="content"]/fieldset[4]/fieldset/table/tbody/tr[3]/td[3]/label ' )

    infos <- remDr$findElements(value = '[@id="valorSA_Total"] | //*[@id="valorSH"] | //*[@id="valorSP"] | //*[@id="totalInternacao"]')

  identificadores <- unlist(lapply(names, function(x) {x$getElementText()}))
  informacoes <- unlist(lapply(infos, function(x) {x$getElementText()}))
  bind_test[[i]] <- data.frame(identificadores , informacoes)

      }}

write.csv(bind_test[[i]], file = paste(bind_test, '.csv', sep = '_'))

Thank you all for any help!

Assuming that the remDr$navigate(html[i]) is what will throw the error you seek to catch, try as follows:

 success <- tryCatch({
   remDr$navigate(html[i])
   TRUE
   }, 
   warning = function(w) { FALSE },
   error = function(e) { FALSE },
   finally = { })

if (!success) next

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM