简体   繁体   中英

Downloading Files with Python Urllib, Urllib2

I am trying to download files from a website using urllib as described in this thread: link text

import urllib
urllib.urlretrieve ("http://www.example.com/songs/mp3.mp3", "mp3.mp3")

I am able to download the files (mostly pdf) but all I get is corrupted files that cannot open. I suspect it's because the website requires a login.

How can the above function be modified to handle cookies? I already know the names of the form fields that carry the username & password information. When I print the return values of urlretrieve I get messages like:

a, b = urllib.urlretrieve ("http://www.example.com/songs/mp3.mp3", "mp3.mp3")
print a, b

>> **cache-control:** no-cache, no-store, must-revalidate, s-maxage=300, proxy-revalida
te

>> **connection:** close

I am able to manually download the files if I enter their urls in the browser. Thanks

First urllib2 actually supports cookies and cookie handling should be easy, second of all you can check what kind of file you have downloaded. Eg AFAIK all mp3 starts with the bytes "ID3"

import cookielib, urllib2
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
r = opener.open("http://example.com/")

I might be possible that the server you requesting to is looking for certain header messages, such as User-Agent. You may try mimicking a browser behavior by sending additional headers.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM