[英]How to loop through multiple URLs in R and save in data frame
我無法遍歷多個 URL 並將其保存在數據框中。 我分享了一次只能檢索一個 url 並保存在數據框中的代碼。
url 中更改的部分是 url 末尾的一個數字,它指的是日期。 我正在嘗試從例如 20190901 到 20190915 中抓取所有數據並將其存儲在同一個數據框中。
這是代碼:
library(rvest)
library(dplyr)
# Specifying URL
url <- 'https://classic.sportsbookreview.com/betting-odds/mlb-baseball/?date=20190901'
# Reading the HTML code from website
oddspage <- read_html(url)
# Using CSS selectors to scrape away teams
awayHtml <- html_nodes(oddspage,'.eventLine-value:nth-child(1) a')
#Using CSS selectors to scrape scores
awayScoreHtml <- html_nodes(oddspage,'.first.total')
awayScore <- html_text(awayScoreHtml)
awayScore <- as.numeric(awayScore)
homeScoreHtml <- html_nodes(oddspage, '.score-periods+ .score-periods .total')
homeScore <- html_text(homeScoreHtml)
homeScore <- as.numeric(homeScore)
# Converting away data to text
away <- html_text(awayHtml)
# Using CSS selectors to scrape home teams
homeHtml <- html_nodes(oddspage,'.eventLine-value+ .eventLine-value a')
# Converting home data to text
home <- html_text(homeHtml)
# Using CSS selectors to scrape Away Odds
awayPinnacleHtml <- html_nodes(oddspage,'.eventLine-consensus+ .eventLine-book.eventLine-book-value:nth-child(1) b')
awayBookmakerHtml <- html_nodes(oddspage,'.eventLine-book:nth-child(12) .eventLine-book-value:nth-child(1) b')
# Converting Away Odds to Text
awayPinnacle <- html_text(awayPinnacleHtml)
awayBookmaker <- html_text(awayBookmakerHtml)
# Converting Away Odds to numeric
awayPinnacle <- as.numeric(awayPinnacle)
awayBookmaker <- as.numeric(awayBookmaker)
# Using CSS selectors to scrape Pinnacle Home Odds
homePinnacleHtml <- html_nodes(oddspage,'.eventLine-consensus+ .eventLine-book .eventLine-book-value+ .eventLine-book-value b')
homeBookmakerHtml <- html_nodes(oddspage,'.eventLine-book:nth-child(12) .eventLine-book-value:nth-child(2) b')
# Converting Home Odds to Text
homePinnacle <- html_text(homePinnacleHtml)
homeBookmaker <- html_text(homeBookmakerHtml)
# Converting Home Odds to Numeric
homePinnacle <- as.numeric(homePinnacle)
homeBookmaker <- as.numeric(homeBookmaker)
# Create Data Frame
df <- data.frame(away,home,awayScore,homeScore,awayPinnacle,homePinnacle,awayBookmaker,homeBookmaker)
View(df)
我對編碼非常陌生,並且無法成功應用類似問題中使用的任何技術。
將所有代碼放入 function 並動態生成date
以生成 url:
get_data <- function(date) {
url <- paste0('https://classic.sportsbookreview.com/betting-odds/mlb-baseball/?date=', date)
#...Rest of the code as it is
#...
}
使用sprintf
創建日期向量
date_vec <- sprintf('201909%02d', 1:15)
date_vec
# [1] "20190901" "20190902" "20190903" "20190904" "20190905" "20190906"
# [7] "20190907" "20190908" "20190909" "20190910" "20190911" "20190912"
#[13] "20190913" "20190914" "20190915"
使用lapply
提取每個日期的數據並將它們組合起來。
all_data <- do.call(rbind, lapply(date_vec, get_data))
您也可以使用map_df
中的purrr
。
all_data <- purrr::map_df(date_vec, get_data)
但是,您可能需要在 function 中為不返回特定字段的任何值的頁面添加一些檢查。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.