简体   繁体   中英

Download an online folder with Windows 10

I wish to download an online folder using Windows 10 on my Dell laptop. In this example the folder I wish to download is named Targetfolder . I am trying to use the Command Window but also am wondering whether there is a simple solution in R . I have included an image at the bottom of this post showing the target folder . I should add that Targetfolder includes a file and multiple subfolders containing files. Not all files have the same extension. Also, please note this is a hypothetical site. I did not want to include the real site for privacy issues.

EDIT

Here is a real site that can serve as a functional, reproducible example. The folder rel2020 can take the place of the hypothetical Targetfolder :

https://www2.census.gov/geo/docs/maps-data/data/rel2020/

None of the answers here seem to work with Targetfolder :

How to download HTTP directory with all files and sub-directories as they appear on the online files/folders list?

Below are my attempts based on answers posted at the link above and the result I obtained:

Attempt One

lftp -c 'mirror --parallel=300 https://www.examplengo.org/datadisk/examplefolder/userdirs/user3/Targetfolder/ ;exit'

Returned:

lftp is not recognized as an internal or external command, operable program or batch file.

Attempt Two

wget -r -np -nH --cut-dirs=3 -R index.html https://www.examplengo.org/datadisk/examplefolder/userdirs/user3/Targetfolder/

Returned:

wget is not recognized as an internal or external command, operable program or batch file.

Attempt Three

https://sourceforge.net/projects/visualwget/files/latest/download

VisualWget returned Unsupported scheme next to the url .

在此处输入图像描述

Here is a way with packages httr and rvest .
First get the folders where the files are from the link.
Then loop through the folders with Map , getting the filenames and downloading them in a lapply loop.
If errors such as time out conditions occur, they will be trapped in tryCatch . The last code lines will tell if and where there were errors.

Note: I only downloaded from folders[1:2] , in the Map below change this to folders .

suppressPackageStartupMessages({
  library(httr)
  library(rvest)
  library(dplyr)
})

link <- "https://www2.census.gov/geo/docs/maps-data/data/rel2020/"

page <- read_html(link)

folders <- page %>%
  html_elements("a") %>%
  html_attr("href") %>%
  .[8:14] %>%
  paste0(link, .)

files_txt <- Map(\(x) {
  x %>%
    read_html() %>%
    html_elements("a") %>%
    html_attr("href") %>%
    grep("\\.txt$", ., value = TRUE) %>%
    paste0(x, .) %>%
    lapply(\(y) {
      tryCatch(
        download.file(y, destfile = file.path("~/Temp", basename(y))),
        error = function(e) e
      )
    })
}, folders[1:2])

err <- sapply(unlist(files_txt, recursive = FALSE), inherits, "error")
lapply(unlist(files_txt, recursive = FALSE)[err], simpleError)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM