[英]Download an online folder with Windows 10
I wish to download an online folder
using Windows 10
on my Dell
laptop.我希望在我的Dell
笔记本电脑上使用Windows 10
下载在线folder
。 In this example the folder
I wish to download is named Targetfolder
.在此示例中,我希望下载的folder
名为Targetfolder
。 I am trying to use the Command Window
but also am wondering whether there is a simple solution in R
.我正在尝试使用Command Window
,但也想知道R
中是否有简单的解决方案。 I have included an image at the bottom of this post showing the target folder
.我在这篇文章的底部添加了一张图片,显示了目标folder
。 I should add that Targetfolder
includes a file and multiple subfolders containing files.我应该补充一点, Targetfolder
包括一个文件和多个包含文件的子文件夹。 Not all files have the same extension.并非所有文件都具有相同的扩展名。 Also, please note this is a hypothetical site.另外,请注意这是一个假设站点。 I did not want to include the real site for privacy issues.我不想包括隐私问题的真实网站。
EDIT编辑
Here is a real site that can serve as a functional, reproducible example.这是一个真实的网站,可以作为一个功能性的、可重现的例子。 The folder rel2020
can take the place of the hypothetical Targetfolder
:文件夹rel2020
可以代替假设的Targetfolder
:
https://www2.census.gov/geo/docs/maps-data/data/rel2020/ https://www2.census.gov/geo/docs/maps-data/data/rel2020/
None of the answers here seem to work with Targetfolder
:这里的答案似乎都不适用于Targetfolder
:
How to download HTTP directory with all files and sub-directories as they appear on the online files/folders list? 如何下载包含在线文件/文件夹列表中的所有文件和子目录的 HTTP 目录?
Below are my attempts based on answers posted at the link above and the result I obtained:以下是我根据上面链接中发布的答案的尝试以及我获得的结果:
Attempt One尝试一
lftp -c 'mirror --parallel=300 https://www.examplengo.org/datadisk/examplefolder/userdirs/user3/Targetfolder/ ;exit'
Returned:回来:
lftp is not recognized as an internal or external command, operable program or batch file.
Attempt Two尝试二
wget -r -np -nH --cut-dirs=3 -R index.html https://www.examplengo.org/datadisk/examplefolder/userdirs/user3/Targetfolder/
Returned:回来:
wget is not recognized as an internal or external command, operable program or batch file.
Attempt Three尝试三
https://sourceforge.net/projects/visualwget/files/latest/download https://sourceforge.net/projects/visualwget/files/latest/download
VisualWget
returned Unsupported scheme
next to the url
. VisualWget
在url
旁边返回Unsupported scheme
。
Here is a way with packages httr
and rvest
.这是使用httr
和rvest
包的方法。
First get the folders where the files are from the link.首先从链接中获取文件所在的文件夹。
Then loop through the folders with Map
, getting the filenames and downloading them in a lapply
loop.然后使用Map
遍历文件夹,获取文件名并在lapply
循环中下载它们。
If errors such as time out conditions occur, they will be trapped in tryCatch
.如果出现超时条件等错误,它们将被困在tryCatch
中。 The last code lines will tell if and where there were errors.最后的代码行将告诉您是否以及在哪里出现错误。
Note: I only downloaded from folders[1:2]
, in the Map
below change this to folders
.注意:我只从folders[1:2]
下载,在下面的Map
中将其更改为folders
。
suppressPackageStartupMessages({
library(httr)
library(rvest)
library(dplyr)
})
link <- "https://www2.census.gov/geo/docs/maps-data/data/rel2020/"
page <- read_html(link)
folders <- page %>%
html_elements("a") %>%
html_attr("href") %>%
.[8:14] %>%
paste0(link, .)
files_txt <- Map(\(x) {
x %>%
read_html() %>%
html_elements("a") %>%
html_attr("href") %>%
grep("\\.txt$", ., value = TRUE) %>%
paste0(x, .) %>%
lapply(\(y) {
tryCatch(
download.file(y, destfile = file.path("~/Temp", basename(y))),
error = function(e) e
)
})
}, folders[1:2])
err <- sapply(unlist(files_txt, recursive = FALSE), inherits, "error")
lapply(unlist(files_txt, recursive = FALSE)[err], simpleError)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.