简体   繁体   English

如何从互联网存档中批量下载文件

[英]How to bulk download files from the internet archive

I checked the original site of the internet archive and they mentioned there a couple of steps to follow, which included the use of the wget utility using Cygwin over windows, I followed the steps above, I made an advanced search and extracted the CSV file, converted it to .txt and then tried to run the following commands我检查了互联网存档的原始站点,他们提到要遵循几个步骤,其中包括在 Windows 上使用 Cygwin 使用 wget 实用程序,我按照上述步骤进行了高级搜索并提取了 CSV 文件,将其转换为 .txt 然后尝试运行以下命令

wget -r -H -nc -np -nH --cut-dirs=1 -A .pdf,.epub -e robots=off -l1 -i ./itemlist.txt -B 'http://archive.org/download/

The emulator gets stuck afterwards and no log message or even an error message appears indicating any practical progress, I want to know what wrong have I done so far.模拟器之后卡住了,没有日志消息甚至错误消息显示任何实际进展,我想知道到目前为止我做错了什么。

After Some time I figured out how to resolve this matter, the commands posted in the internet archive help blog are general commands posted to help use the wget utility , the commands we will need right here are simply those which follow一段时间后,我想出了如何解决这个问题,互联网存档帮助博客中发布的命令是发布以帮助使用 wget 实用程序的一般命令,我们在这里需要的命令只是下面的命令

--cutdirs=1
-A .pdf,.epub
-e robots=off
-i ./itemlist.txt

and of course the url source:当然还有网址来源:

B- 'archive.org/download/'

The ia command-line tool is the official way to do this. ia 命令行工具是执行此操作的官方方法。 If you can craft a search term that captures all your items, you can have ia download everything that matches.如果您可以制作一个包含所有项目的搜索词,您就可以让ia下载所有匹配的内容。

For example:例如:

ia download --search 'creator:Hamilton Commodore User Group'

will download all of the items attributed to this (now defunct) computer user group.将下载属于此(现已不存在的)计算机用户组的所有项目。 This is a live, working query that downloads roughly 8.6 MB of data for 40 Commodore 64 disk images.这是一个实时工作查询,可为 40 个 Commodore 64 磁盘映像下载大约 8.6 MB 的数据。

It will also download from an itemlist, as above.它还将从项目列表下载,如上所述。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM