Rvest: Headlines returning empty list

Question

I'm trying to replicate this tutorial on rvest here . However, at the start I'm already having issues. This is the code I'm using

library(rvest)
#Specifying the url for desired website to be scrapped
url <- 'https://www.nytimes.com/section/politics'

#Reading the HTML code from the website - headlines
webpage <- read_html(url)
headline_data <- html_nodes(webpage,'.story-link a, .story-body a')

My results when I look at headline_data return

{xml_nodeset (0)}

But in the tutorial it returns a list of length 48

{xml_nodeset (48)}

Any reason for the discrepancy?

Answer 1

As mentioned in the comments, there are no elements with the specified class you are searching for.

To begin, based on current tags you can get headlines with

library(rvest)
library(dplyr)
url <- 'https://www.nytimes.com/section/politics'

url %>%
  read_html() %>%
  html_nodes("h2.css-l2vidh a") %>%
  html_text()

#[1] "Trump’s Secrecy Fight Escalates as Judge Rules for Congress in Early Test"                    
#[2] "A Would-Be Trump Aide’s Demands: A Jet on Call, a Future Cabinet Post and More"               
#[3] "He’s One of the Biggest Backers of Trump’s Push to Protect American Steel. And He’s Canadian."
#[4] "Accountants Must Turn Over Trump’s Financial Records, Lower-Court Judge Rules"

and to get individual URL's of those headlines you could do

url %>%
  read_html() %>%
  html_nodes("h2.css-l2vidh a") %>%
  html_attr("href") %>%
  paste0("https://www.nytimes.com", .)

#[1] "https://www.nytimes.com/2019/05/20/us/politics/mcgahn-trump-congress.html"                                                                   
#[2] "https://www.nytimes.com/2019/05/20/us/politics/kris-kobach-trump.html"                                                                       
#[3] "https://www.nytimes.com/2019/05/20/us/politics/hes-one-of-the-biggest-backers-of-trumps-push-to-protect-american-steel-and-hes-canadian.html"
#[4] "https://www.nytimes.com/2019/05/20/us/politics/trump-financial-records.html"

Rvest: Headlines returning empty list

Question

1 answers

solution1
1 ACCPTED 2019-05-21 06:06:07

Rvest: Headlines returning empty list

Question

1 answers

solution1 1 ACCPTED 2019-05-21 06:06:07

solution1
1 ACCPTED 2019-05-21 06:06:07