使用 purrr 下载多个文件

Question

我正在尝试下载所有 Excel 文件： https://www.grants.gov.au/reports/gaweeklyexport

使用下面的代码，每个链接都会出现类似于以下的错误（总共 77 个）：

[[1]]$error
<simpleError in download.file(.x, .y, mode = "wb"): scheme not supported in URL '/Reports/GaWeeklyExportDownload?GaWeeklyExportUuid=0db183a2-11c6-42f8-bf52-379aafe0d21b'>

尝试遍历整个列表时出现此错误，但是当我在单个列表项上调用 download.file 时它工作正常。

如果有人能告诉我我做错了什么或提出更好的方法，我将不胜感激。

产生错误的代码：


library(tidyverse)
library(rvest)

# Reading links to the Excel files to be downloaded
url <- "https://www.grants.gov.au/reports/gaweeklyexport"

webpage <- read_html(url)

# The list of links to the Excel files
links <- html_attr(html_nodes(webpage, '.u'), "href")

# Creating names for the files to supply to the download.file function
wb_names = str_c(1:77, ".xlsx")

# Defining a function that using purrr's safely to ensure it doesn't fail if there is a dead link
safe_download <- safely(~ download.file(.x , .y, mode = "wb"))

# Combining links, file names, and the function returns an error
map2(links, wb_names, safe_download)

Answer 1

您需要在 URL 前添加'https://www.grants.gov.au/'以获得可用于下载文件的文件的绝对路径。

library(rvest)
library(purrr)

url <- "https://www.grants.gov.au/reports/gaweeklyexport"

webpage <- read_html(url)
# The list of links to the Excel files
links <- paste0('https://www.grants.gov.au/', html_attr(html_nodes(webpage, '.u'), "href"))

safe_download <- safely(~ download.file(.x , .y, mode = "wb"))

# Creating names for the files to supply to the download.file function
wb_names = paste0(1:77, ".xlsx")
map2(links, wb_names, safe_download)

使用 purrr 下载多个文件

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-11-14 09:30:49

使用 purrr 下载多个文件

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-11-14 09:30:49

解决方案1
1 已采纳 2020-11-14 09:30:49