简体   繁体   中英

Web Scraping a table into R

I'm new to trying to web scrape, and am sure there's a very obvious answer I'm missing here, but have exhausted every post I can find on using rvest, XML, xml2, etc on reading a table from the web into R, and I've had no success.

An example of the table I'm looking to scrape can be found here: https://www.eliteprospects.com/iframe_player_stats.php?player=364033

I've tried

EXAMPLE <- read_html("http://www.eliteprospects.com/iframe_player_stats.php? 
player=364033")
EXAMPLE


URL <- 'http://www.eliteprospects.com/iframe_player_stats.php?player=364033'
table <- URL %>%  
read_html %>% 
html_nodes("table") 

But am unsure what to do with these results to get them into a dataframe, or anything usable.

You need to extract the correct html_nodes , and then convert them into a data.frame . The code below is an example of how to go about doing something like this. I find Selector Gadget very useful for finding the right CSS selectors.

library(tidyverse)
library(rvest)

# read the html
html <- read_html('http://www.eliteprospects.com/iframe_player_stats.php?player=364033')

# function to read columns
read_col <- function(x){
  col <- html %>%  
    # CSS nodes to select by using selector gadget
    html_nodes(paste0("td:nth-child(", x, ")")) %>% 
    html_text()
  return(col)
}

# apply the function
col_list <- lapply(c(1:8, 10:15), read_col)

# collapse into matrix
mat <- do.call(cbind, col_list)

# put data into dataframe
df <- data.frame(mat[2:nrow(mat), ] %>% data.frame()) 

# assign names
names(df) <- mat[1, ] 

df

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM