R：循环浏览链接列表

Question

I have some code that scrapes data off this link ( http://stats.ncaa.org/team/stats?org_id=575&sport_year_ctl_id=12280 ) and runs some calculations. 我有一些代码可以从此链接中抓取数据（ http://stats.ncaa.org/team/stats?org_id=575&sport_year_ctl_id=12280 ）并运行一些计算。

What I want to do is cycle through every team and collect and run the manipulations on every team. 我要做的是循环浏览每个团队，并收集和运行每个团队的操作。 I have a dataframe with every team link, like the one above. 我有一个数据框，其中包含每个团队链接，例如上面的链接。

Psuedo code: for (link in teamlist) {scrape, manipulate, put into a table} 伪代码：用于（团队列表中的链接）{抓取，操作，放入表格中}

However, I can't figure out how to run loop through the links. 但是，我不知道如何通过链接运行循环。

I've tried doing URL = teamlist$link[i], but I get an error when using readhtmltable(). 我尝试做URL = teamlist $ link [i]，但是使用readhtmltable（）时出现错误。 I have no trouble manually pasting each team individual URL into the script, just only when trying to pull it from a table. 我没有麻烦手动将每个团队的个人URL粘贴到脚本中，仅在尝试将其从表中拉出时。

Current code: 当前代码：

library(XML)
library(gsubfn)

URL= 'http://stats.ncaa.org/team/stats?org_id=575&sport_year_ctl_id=12280'  
tx<- readLines(URL)
tx2<-gsub("</tbody>","",tx)
tx2<-gsub("<tfoot>","",tx2)
tx2<-gsub("</tfoot>","</tbody>",tx2)
Player_Stats = readHTMLTable(tx2,asText=TRUE, header = T, which = 2,stringsAsFactors = F)

Thanks. 谢谢。

Answer 1

I agree with @ialm that you should check out the rvest package, which makes it very fun and straightforward to loop through links. 我同意@ialm，应该检查rvest软件包，这使通过链接循环变得非常有趣和直接。 I will create some example code here using similar subject matter for you to check out. 我将在此处使用类似的主题创建一些示例代码，以供您查看。

Here I am generating a list of links that I will iterate through 在这里，我生成了一个链接列表，我将对其进行迭代

rm(list=ls())
library(rvest)
mainweb="http://www.basketball-reference.com/"

urls=html("http://www.basketball-reference.com/teams") %>%
html_nodes("#active a") %>%
html_attrs()

Now that the list of links is complete I iterate through each link and pull a table from each 现在链接列表已完成，我遍历每个链接并从每个链接中提取一个表

teamdata=c()
j=1
for(i in urls){
bball <- html(paste(mainweb, i, sep=""))
teamdata[j]= bball %>%
html_nodes(paste0("#", gsub("/teams/([A-Z]+)/$","\\1", urls[j], perl=TRUE))) %>%
html_table()
j=j+1
}

Answer 2

Please see the code below, which basically builds off your code and loops through two different team pages as identified by the vector team_codes . 请参见下面的代码，该代码基本上可以构建您的代码，并通过vector team_codes标识的两个不同的团队页面循环。 The tables are returned in a list where each list element corresponds to a team's table. 这些表以列表的形式返回，其中每个列表元素都对应于团队的表。 However, the tables look like they will need more cleaning. 但是，桌子似乎需要更多清洁。

library(XML)
library(gsubfn)

Player_Stats <- list()
j <- 1
team_codes <-  c(575, 580)
for(code in team_codes) {

  URL <- paste0('http://stats.ncaa.org/team/stats?org_id=', code, '&sport_year_ctl_id=12280')
  tx<- readLines(URL)
  tx2<-gsub("</tbody>","",tx)
  tx2<-gsub("<tfoot>","",tx2)
  tx2<-gsub("</tfoot>","</tbody>",tx2)
  Player_Stats[[j]] = readHTMLTable(tx2,asText=TRUE, header = T, which = 2,stringsAsFactors = F)
  j <- j + 1

}

R：循环浏览链接列表

问题描述

2 个解决方案

解决方案1
1 2015-12-18 19:08:09

Here I am generating a list of links that I will iterate through 在这里，我生成了一个链接列表，我将对其进行迭代

Now that the list of links is complete I iterate through each link and pull a table from each 现在链接列表已完成，我遍历每个链接并从每个链接中提取一个表

解决方案2
0 已采纳 2015-12-18 17:44:05

R：循环浏览链接列表

问题描述

2 个解决方案

解决方案1 1 2015-12-18 19:08:09

Here I am generating a list of links that I will iterate through 在这里，我生成了一个链接列表，我将对其进行迭代

Now that the list of links is complete I iterate through each link and pull a table from each 现在链接列表已完成，我遍历每个链接并从每个链接中提取一个表

解决方案2 0 已采纳 2015-12-18 17:44:05

解决方案1
1 2015-12-18 19:08:09

解决方案2
0 已采纳 2015-12-18 17:44:05