[英]Download specific file in url using PHP/Python
I previously used to use wget -r
on the linux terminal for downloading files with certain extensions: 我之前习惯在linux终端上使用wget -r
来下载带有某些扩展名的文件:
wget -r -A Ext URL
But now I was assigned by my lecturer to do the same thing using PHP or Python. 但现在我的讲师指派我使用PHP或Python做同样的事情。 Who can help? 谁可以帮忙?
I guess urllib pretty well for you 我想urllib对你很好
import urllib
urllib.urlretrieve (URL, file)
You can use PHP function file_get_contents()
to retrieve the contents of a documents. 您可以使用PHP函数file_get_contents()
来检索文档的内容。 The first argument of the function is filename which can either be a local path to a file or a URL. 该函数的第一个参数是filename,它可以是文件的本地路径或URL。
See example from PHP docs 请参阅PHP 文档中的示例
<?php
$homepage = file_get_contents('http://www.example.com/');
echo $homepage;
?>
Alternatively, you can use Requests : Requests is the only Non-GMO HTTP library for Python, safe for human consumption. 或者,您可以使用Requests :Requests是Python的唯一非GMO HTTP库,可供人类使用。
Example (from the doc): 示例(来自doc):
>>> r = requests.get('https://api.github.com/user', auth=('user', 'pass'))
>>> r.status_code
200
>>> r.headers['content-type']
'application/json; charset=utf8'
>>> r.encoding
'utf-8'
>>> r.text
u'{"type":"User"...'
>>> r.json()
{u'private_gists': 419, u'total_private_repos': 77, ...}
For Python, use a web-crawler library such as scrapy. 对于Python,请使用网络爬虫库,例如scrapy。
It has classes that do all the work when passed arguments similar to those you put on the wget
command line. 它具有在传递与wget
命令行上的参数类似的参数时执行所有工作的类 。
You can use scrapy pipelines to filter out unwanted downloads, and value-add the downloads such as adding thumbnails. 您可以使用scrapy 管道过滤掉不需要的下载,并对下载进行增值,例如添加缩略图。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.