Python从互联网地址下载所有文件？

Question

I want to download all files from an internet page, actually all the image files. 我想从互联网页面下载所有文件，实际上是所有图像文件。 I found the 'urllib' module to be what I need. 我发现'urllib'模块是我需要的。 There seems to be a method to download a file, if you know the filename, but I don't. 如果您知道文件名，似乎有一种下载文件的方法，但我不知道。

urllib.urlretrieve('http://www.example.com/page', 'myfile.jpg')

Is there a method to download all the files from the page and maybe return a list? 是否有方法从页面下载所有文件，并可能返回列表？

Answer 1

Here's a little example to get you started with using BeautifulSoup for this kind of exercise - you give this script a URL, and it will print out the URLs of images that are referenced from that page in the src attribute of img tags that end with jpg or png : 这里有一个小例子让你开始使用BeautifulSoup进行这种练习 - 你给这个脚本一个URL，它将打印出以jpg结尾的img标签的src属性从该页面引用的图像的URL或者png ：

import sys, urllib, re, urlparse
from BeautifulSoup import BeautifulSoup

if not len(sys.argv) == 2:
    print >> sys.stderr, "Usage: %s <URL>" % (sys.argv[0],)
    sys.exit(1)

url = sys.argv[1]

f = urllib.urlopen(url)
soup = BeautifulSoup(f)
for i in soup.findAll('img', attrs={'src': re.compile('(?i)(jpg|png)$')}):
    full_url = urlparse.urljoin(url, i['src'])
    print "image URL: ", full_url

Then you can use urllib.urlretrieve to download each of the images pointed to by full_url , but at that stage you have to decide how to name them and what to do with the downloaded images, which isn't specified in your question. 然后你可以使用urllib.urlretrieve下载full_url指向的每个图像，但是在那个阶段你必须决定如何命名它们以及如何处理下载的图像，这在你的问题中没有说明。

Python从互联网地址下载所有文件？

问题描述

1 个解决方案

解决方案1
7 已采纳 2011-10-01 10:30:40

Python从互联网地址下载所有文件？

问题描述

1 个解决方案

解决方案1 7 已采纳 2011-10-01 10:30:40

解决方案1
7 已采纳 2011-10-01 10:30:40