Downloading Files with Python Urllib, Urllib2

Question

I am trying to download files from a website using urllib as described in this thread: link text

import urllib
urllib.urlretrieve ("http://www.example.com/songs/mp3.mp3", "mp3.mp3")

I am able to download the files (mostly pdf) but all I get is corrupted files that cannot open. I suspect it's because the website requires a login.

How can the above function be modified to handle cookies? I already know the names of the form fields that carry the username & password information. When I print the return values of urlretrieve I get messages like:

a, b = urllib.urlretrieve ("http://www.example.com/songs/mp3.mp3", "mp3.mp3")
print a, b

>> **cache-control:** no-cache, no-store, must-revalidate, s-maxage=300, proxy-revalida
te

>> **connection:** close

I am able to manually download the files if I enter their urls in the browser. Thanks

Answer 1

First urllib2 actually supports cookies and cookie handling should be easy, second of all you can check what kind of file you have downloaded. Eg AFAIK all mp3 starts with the bytes "ID3"

import cookielib, urllib2
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
r = opener.open("http://example.com/")

Answer 2

I might be possible that the server you requesting to is looking for certain header messages, such as User-Agent. You may try mimicking a browser behavior by sending additional headers.

Downloading Files with Python Urllib, Urllib2

Question

2 answers

solution1
1 2011-01-22 13:57:55

solution2
0 2011-01-22 13:23:01

Downloading Files with Python Urllib, Urllib2

Question

2 answers

solution1 1 2011-01-22 13:57:55

solution2 0 2011-01-22 13:23:01

solution1
1 2011-01-22 13:57:55

solution2
0 2011-01-22 13:23:01