使用PHP / Python在url中下载特定文件

Question

I previously used to use wget -r on the linux terminal for downloading files with certain extensions: 我之前习惯在linux终端上使用wget -r来下载带有某些扩展名的文件：

wget -r -A Ext URL

But now I was assigned by my lecturer to do the same thing using PHP or Python. 但现在我的讲师指派我使用PHP或Python做同样的事情。 Who can help? 谁可以帮忙？

Answer 1

I guess urllib pretty well for you 我想urllib对你很好

import urllib
urllib.urlretrieve (URL, file)

Answer 2

You can use PHP function file_get_contents() to retrieve the contents of a documents. 您可以使用PHP函数file_get_contents()来检索文档的内容。 The first argument of the function is filename which can either be a local path to a file or a URL. 该函数的第一个参数是filename，它可以是文件的本地路径或URL。
See example from PHP docs 请参阅PHP 文档中的示例

<?php
    $homepage = file_get_contents('http://www.example.com/');
    echo $homepage;
?>

Answer 3

Alternatively, you can use Requests : Requests is the only Non-GMO HTTP library for Python, safe for human consumption. 或者，您可以使用Requests ：Requests是Python的唯一非GMO HTTP库，可供人类使用。

Example (from the doc): 示例（来自doc）：

>>> r = requests.get('https://api.github.com/user', auth=('user', 'pass'))
>>> r.status_code
200
>>> r.headers['content-type']
'application/json; charset=utf8'
>>> r.encoding
'utf-8'
>>> r.text
u'{"type":"User"...'
>>> r.json()
{u'private_gists': 419, u'total_private_repos': 77, ...}

Answer 4

For Python, use a web-crawler library such as scrapy. 对于Python，请使用网络爬虫库，例如scrapy。

It has classes that do all the work when passed arguments similar to those you put on the wget command line. 它具有在传递与wget命令行上的参数类似的参数时执行所有工作的类。

You can use scrapy pipelines to filter out unwanted downloads, and value-add the downloads such as adding thumbnails. 您可以使用scrapy 管道过滤掉不需要的下载，并对下载进行增值，例如添加缩略图。

使用PHP / Python在url中下载特定文件

问题描述

4 个解决方案

解决方案1
2 2016-09-28 12:39:36

解决方案2
1 已采纳 2016-09-28 12:41:13

解决方案3
0 2016-09-28 12:50:44

解决方案4
0 2016-09-30 01:20:16

使用PHP / Python在url中下载特定文件

问题描述

4 个解决方案

解决方案1 2 2016-09-28 12:39:36

解决方案2 1 已采纳 2016-09-28 12:41:13

解决方案3 0 2016-09-28 12:50:44

解决方案4 0 2016-09-30 01:20:16

解决方案1
2 2016-09-28 12:39:36

解决方案2
1 已采纳 2016-09-28 12:41:13

解决方案3
0 2016-09-28 12:50:44

解决方案4
0 2016-09-30 01:20:16