简体   繁体   中英

I'm trying to webscrape data from a website into R

I'm not sure what I'm missing in my code. I'm trying to webscrape data from https://www.espn.com/nfl/standings/_/season/2010 into a tibble in R. My code so far is the following:

library(tidyverse)
library(rvest)

# url I want the data from. 
NFL_2010.url <- "https://www.espn.com/nfl/standings/_/season/2010"
# Use webscraping to import the data from the url into R
NFL_2010 <- NFL_2010.url %>%
  read_html(NFL_2010) %>%
  #There is more than 1 table, so I'm trying to use html_nodes 
  html_nodes("table") %>%
  html_table () %>%
  #convert data to a tibble
  as_tibble()

What am I missing here?

Webscraping of this page returns a list with all the tables split into 4 pieces. So you have to join these pieces together and then convert to 2 tibbles. For example:

library(tidyverse)
library(rvest)

NFL_2010.url <- "https://www.espn.com/nfl/standings/_/season/2010"

NFL_2010 <- NFL_2010.url %>%
  read_html() %>%
  html_nodes("table") %>%
  html_table()

# American Football Conference
NFL_2010_AFC <- bind_cols(NFL_2010[[1]], NFL_2010[[2]]) %>%
  as_tibble()

# National Football Conference
NFL_2010_NFC <- bind_cols(NFL_2010[[3]], NFL_2010[[4]]) %>%
  as_tibble()

And it still requires some bit of data cleaning after that...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM