简体   繁体   中英

Downloading links with Python urllib2

I want to download an mp3 off of a page, but what I'm getting is just the html, and not the mp3 itself. The code I'm using is from this link here: https://stackoverflow.com/a/16518224/2137668

Why am I not able to get the mp3? Here's an example for testing that shows it gets downloaded as html: http://www5.zippyshare.com/d/77609120/61098/Cleavage%20-%20Prove%20%28Original%20Mix%29%20%5bquality-dance-music.com%5d.mp3

When I try to open that URL in a web browser, or with wget , I get a 302 redirection to http://www5.zippyshare.com/v/77609120/file.html , which is of course an HTML page.

Many websites redirect you to such "container pages" (or just return them directly) when you browse to things like images, songs, and videos. This may be to improve your user experience, to make it harder for other sites to "deep-link" their content, or to make it harder for you to "steal" their content.

If it's one of the first two, often the answer is trivial: add a Referer header that points to the download page you got the link from (or, sometimes, to anything on the same site—even the same URL you're downloading).

If it's the third, they will usually put a lot more protection on than that. For just one example, they may require you to have a cookie that you got from sitting on the download page and waiting out a 30-second timer and that's only valid for 30 minutes.

If you understand HTTP and JavaScript well enough, and don't care about violating their terms of service, you can usually reverse-engineer each of their protections and write yourself a download script that'll work until they change things up next month, but that's usually not worth doing.

Anyway, given that this site is named zippyshare, I'm guessing it's the last of these. These kinds of sites make their money by showing you ads every time you download a file, and by prompting you to pay a monthly fee to get direct/accelerated/whatever downloads, and so on, so they will put all kinds of hurdles in the way of you downloading files directly without seeing those ads or paying that fee.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM