简体   繁体   English

Web将表格刮到R中

[英]Web Scraping a table into R

I'm new to trying to web scrape, and am sure there's a very obvious answer I'm missing here, but have exhausted every post I can find on using rvest, XML, xml2, etc on reading a table from the web into R, and I've had no success. 我是不熟悉Web爬网的新手,并且肯定在这里没有找到一个非常明显的答案,但是在将rvest,XML,xml2等从网络中的表读取到R中的过程中,我已经用尽了我能找到的每篇文章,但我没有成功。

An example of the table I'm looking to scrape can be found here: https://www.eliteprospects.com/iframe_player_stats.php?player=364033 我要抓取的表格示例可在以下位置找到: https : //www.eliteprospects.com/iframe_player_stats.php?player=364033

I've tried 我试过了

EXAMPLE <- read_html("http://www.eliteprospects.com/iframe_player_stats.php? 
player=364033")
EXAMPLE


URL <- 'http://www.eliteprospects.com/iframe_player_stats.php?player=364033'
table <- URL %>%  
read_html %>% 
html_nodes("table") 

But am unsure what to do with these results to get them into a dataframe, or anything usable. 但是不确定如何处理这些结果以使它们进入数据帧或任何可用的数据。

You need to extract the correct html_nodes , and then convert them into a data.frame . 您需要提取正确的html_nodes ,然后将它们转换为data.frame The code below is an example of how to go about doing something like this. 下面的代码是如何执行类似操作的示例。 I find Selector Gadget very useful for finding the right CSS selectors. 我发现Selector小工具对于找到正确的CSS选择器非常有用。

library(tidyverse)
library(rvest)

# read the html
html <- read_html('http://www.eliteprospects.com/iframe_player_stats.php?player=364033')

# function to read columns
read_col <- function(x){
  col <- html %>%  
    # CSS nodes to select by using selector gadget
    html_nodes(paste0("td:nth-child(", x, ")")) %>% 
    html_text()
  return(col)
}

# apply the function
col_list <- lapply(c(1:8, 10:15), read_col)

# collapse into matrix
mat <- do.call(cbind, col_list)

# put data into dataframe
df <- data.frame(mat[2:nrow(mat), ] %>% data.frame()) 

# assign names
names(df) <- mat[1, ] 

df

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM