I wish to download an online folder
using Windows 10
on my Dell
laptop. In this example the folder
I wish to download is named Targetfolder
. I am trying to use the Command Window
but also am wondering whether there is a simple solution in R
. I have included an image at the bottom of this post showing the target folder
. I should add that Targetfolder
includes a file and multiple subfolders containing files. Not all files have the same extension. Also, please note this is a hypothetical site. I did not want to include the real site for privacy issues.
EDIT
Here is a real site that can serve as a functional, reproducible example. The folder rel2020
can take the place of the hypothetical Targetfolder
:
https://www2.census.gov/geo/docs/maps-data/data/rel2020/
None of the answers here seem to work with Targetfolder
:
Below are my attempts based on answers posted at the link above and the result I obtained:
Attempt One
lftp -c 'mirror --parallel=300 https://www.examplengo.org/datadisk/examplefolder/userdirs/user3/Targetfolder/ ;exit'
Returned:
lftp is not recognized as an internal or external command, operable program or batch file.
Attempt Two
wget -r -np -nH --cut-dirs=3 -R index.html https://www.examplengo.org/datadisk/examplefolder/userdirs/user3/Targetfolder/
Returned:
wget is not recognized as an internal or external command, operable program or batch file.
Attempt Three
https://sourceforge.net/projects/visualwget/files/latest/download
VisualWget
returned Unsupported scheme
next to the url
.
Here is a way with packages httr
and rvest
.
First get the folders where the files are from the link.
Then loop through the folders with Map
, getting the filenames and downloading them in a lapply
loop.
If errors such as time out conditions occur, they will be trapped in tryCatch
. The last code lines will tell if and where there were errors.
Note: I only downloaded from folders[1:2]
, in the Map
below change this to folders
.
suppressPackageStartupMessages({
library(httr)
library(rvest)
library(dplyr)
})
link <- "https://www2.census.gov/geo/docs/maps-data/data/rel2020/"
page <- read_html(link)
folders <- page %>%
html_elements("a") %>%
html_attr("href") %>%
.[8:14] %>%
paste0(link, .)
files_txt <- Map(\(x) {
x %>%
read_html() %>%
html_elements("a") %>%
html_attr("href") %>%
grep("\\.txt$", ., value = TRUE) %>%
paste0(x, .) %>%
lapply(\(y) {
tryCatch(
download.file(y, destfile = file.path("~/Temp", basename(y))),
error = function(e) e
)
})
}, folders[1:2])
err <- sapply(unlist(files_txt, recursive = FALSE), inherits, "error")
lapply(unlist(files_txt, recursive = FALSE)[err], simpleError)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.