简体   繁体   中英

Webscraping in R: Why does my loop return NA?

I've posted about the same question before here but the other thread is dying and I'm getting desperate.

I'm trying to scrape a webpage using rvest etc. Most of the stuff works but now I need R to loop trough a list of links and all it gives me is NA.

This is my code:

install.packages("rvest")

site20min <- read_xml("https://api.20min.ch/rss/view/1")

urls <- site20min %>% html_nodes('link') %>% html_text()

I need the next one because the first two links the api gives me direct back to the homepage

urls <- urls[-c(1:2)]

If I print my links now it gives me a list of 109 links.

urls

Now this is my loop. I need it to give me the first link of urls so I can read_html it

I'm looking for something like: " https://beta.20min.ch/story/so-sieht-die-coronavirus-kampagne-des-bundes-aus-255254143692?legacy=true ".

I use break so it shows me only the first link but all I get is NA.

for(i in i:length(urls)) {
  link <- urls[i]
  break
} 
link

If I can get this far, I think I can handle the rest with rvest but I've tried for hours now and just ain't getting anywhere.

Thx for your help.

Can you try out

for(i in 1:length(urls)) {
  link <- urls[i]
  break
} 
link

instead?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM