简体   繁体   中英

How should I avoid 404 errors when scraping with R

I'm accessing web pages by looping over a couple of variables to insert into the URL

There will be occasional 404 errors.

How do I insert some sort of catch for these pages to avoid breaking the code. I currently use the XML package but of course could load others if appropriate

TIA

Most of times I use RCurl::url.exists() . In case you have a list or a data frame containing all the urls you can try this:

map(p, ~ifelse(RCurl::url.exists(.), ., NA))

HTH!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM