I need to download a series of Excel files from URL's that all look as follows:
http://example.com/orResultsED.cfm?MODE=exED&ED=01&EventId=31
http://example.com/orResultsED.cfm?MODE=exED&ED=02&EventId=31
...
http://example.com/orResultsED.cfm?MODE=exED&ED=87&EventId=31
I've got some of the building blocks inside the loop, such as:
for(i in 1:87) {
url <- paste0("http://example.com/orResultsED.cfm?MODE=exED&ED=", i, "&EventId=31")
file <- paste0("Data/myExcel_", i, ".xlsx")
if (!file.exists(file)) download.file(url, file)
}
My problems :
seq
to prepend the 0 (I tried sprintf
with no luck) @akrun solution works well. But it turns out not all my Excel files have the same number of columns:
map(files, ~read.xlsx(.x,
colNames = FALSE,
sheet = 1,
startRow = 4,
)) %>%
bind_rows
Error in bind_rows_(x, .id) :
Column `X1` can't be converted from numeric to character
I think this error actually points to the unequal number of column. I tried adding fill = NA
(when testing map_df()
), but it didn't help.
We can create it with sprintf
paste0("http://example.com/orResultsED.cfm?MODE=exED&ED=", sprintf("%02d", 1), "&EventId=31")
#[1] "http://example.com/orResultsED.cfm?MODE=exED&ED=01&EventId=31"
In the loop,
for(i in 1:87) {
i1 <- sprintf('%02d', i)
url <- paste0("http://example.com/orResultsED.cfm?MODE=exED&ED=", i1, "&EventId=31")
file <- paste0("Data/myExcel_", i, ".xlsx")
if (!file.exists(file)) download.file(url, file)
}
Assuming that the files are downloaded in the working directory
files <- list.files(full.names = TRUE)
library(openxlsx)
library(purrr)
library(dplyr)
map(files, ~read.xlsx(.x, sheet = 1, startRow = 3)) %>%
bind_rows
Or as @hrbrmstr mentioned in the comments, map_df
can be used which returns a single dataset
map_df(files, ~read.xlsx(.x, sheet = 1, startRow = 3))
Based on the comments from OP, there seems to be a difference in column class for some of the datasets. In that case, bind_rows
gives an error. One option is to use rbindlist
from data.table
map(files, ~read.xlsx(.x, sheet = 1, startRow = 3)) %>%
data.table::rbindlist(fill = TRUE)
downloading and reading in 1 loop. Hopefully, the columns are aligned if not use something like plyr::rbind.fill
instead of do.call(rbind, list)
do.call(rbind, lapply(1:87, function(n) {
url <- paste0("http://example.com/orResultsED.cfm?MODE=exED&ED=",
sprintf("%02d", n), "&EventId=31")
file <- paste0("Data/myExcel_", n, ".xlsx")
if (!file.exists(file)) download.file(url, file)
readxl::read_excel(file, skip=2)
Sys.sleep(5)
}))
you can also use regmatches
num=sprintf("%02.0f",1:87)
urls=rep("http://example.com/orResultsED.cfm?MODE=exED&ED=01&EventId=31",87)
`regmatches`(urls,regexpr("\\d+",urls))<-num
urls[87]
[1] "http://example.com/orResultsED.cfm?MODE=exED&ED=87&EventId=31"
To have all the files:
files <- paste0("Data/myExcel_",num , ".xlsx")
to download the files:
mapply(function(x,y)if(!file.exists(x))download.file(y,x),files,urls)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.