简体   繁体   English

使用 Windows 10 下载在线文件夹

[英]Download an online folder with Windows 10

I wish to download an online folder using Windows 10 on my Dell laptop.我希望在我的Dell笔记本电脑上使用Windows 10下载在线folder In this example the folder I wish to download is named Targetfolder .在此示例中,我希望下载的folder名为Targetfolder I am trying to use the Command Window but also am wondering whether there is a simple solution in R .我正在尝试使用Command Window ,但也想知道R中是否有简单的解决方案。 I have included an image at the bottom of this post showing the target folder .我在这篇文章的底部添加了一张图片,显示了目标folder I should add that Targetfolder includes a file and multiple subfolders containing files.我应该补充一点, Targetfolder包括一个文件和多个包含文件的子文件夹。 Not all files have the same extension.并非所有文件都具有相同的扩展名。 Also, please note this is a hypothetical site.另外,请注意这是一个假设站点。 I did not want to include the real site for privacy issues.我不想包括隐私问题的真实网站。

EDIT编辑

Here is a real site that can serve as a functional, reproducible example.这是一个真实的网站,可以作为一个功能性的、可重现的例子。 The folder rel2020 can take the place of the hypothetical Targetfolder :文件夹rel2020可以代替假设的Targetfolder

https://www2.census.gov/geo/docs/maps-data/data/rel2020/ https://www2.census.gov/geo/docs/maps-data/data/rel2020/

None of the answers here seem to work with Targetfolder :这里的答案似乎都不适用于Targetfolder

How to download HTTP directory with all files and sub-directories as they appear on the online files/folders list? 如何下载包含在线文件/文件夹列表中的所有文件和子目录的 HTTP 目录?

Below are my attempts based on answers posted at the link above and the result I obtained:以下是我根据上面链接中发布的答案的尝试以及我获得的结果:

Attempt One尝试一

lftp -c 'mirror --parallel=300 https://www.examplengo.org/datadisk/examplefolder/userdirs/user3/Targetfolder/ ;exit'

Returned:回来:

lftp is not recognized as an internal or external command, operable program or batch file.

Attempt Two尝试二

wget -r -np -nH --cut-dirs=3 -R index.html https://www.examplengo.org/datadisk/examplefolder/userdirs/user3/Targetfolder/

Returned:回来:

wget is not recognized as an internal or external command, operable program or batch file.

Attempt Three尝试三

https://sourceforge.net/projects/visualwget/files/latest/download https://sourceforge.net/projects/visualwget/files/latest/download

VisualWget returned Unsupported scheme next to the url . VisualWgeturl旁边返回Unsupported scheme

在此处输入图像描述

Here is a way with packages httr and rvest .这是使用httrrvest包的方法。
First get the folders where the files are from the link.首先从链接中获取文件所在的文件夹。
Then loop through the folders with Map , getting the filenames and downloading them in a lapply loop.然后使用Map遍历文件夹,获取文件名并在lapply循环中下载它们。
If errors such as time out conditions occur, they will be trapped in tryCatch .如果出现超时条件等错误,它们将被困在tryCatch中。 The last code lines will tell if and where there were errors.最后的代码行将告诉您是否以及在哪里出现错误。

Note: I only downloaded from folders[1:2] , in the Map below change this to folders .注意:我只从folders[1:2]下载,在下面的Map中将其更改为folders

suppressPackageStartupMessages({
  library(httr)
  library(rvest)
  library(dplyr)
})

link <- "https://www2.census.gov/geo/docs/maps-data/data/rel2020/"

page <- read_html(link)

folders <- page %>%
  html_elements("a") %>%
  html_attr("href") %>%
  .[8:14] %>%
  paste0(link, .)

files_txt <- Map(\(x) {
  x %>%
    read_html() %>%
    html_elements("a") %>%
    html_attr("href") %>%
    grep("\\.txt$", ., value = TRUE) %>%
    paste0(x, .) %>%
    lapply(\(y) {
      tryCatch(
        download.file(y, destfile = file.path("~/Temp", basename(y))),
        error = function(e) e
      )
    })
}, folders[1:2])

err <- sapply(unlist(files_txt, recursive = FALSE), inherits, "error")
lapply(unlist(files_txt, recursive = FALSE)[err], simpleError)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM