Why can't I download a midi file with python requests?

Question

I'm trying to download a series of classical music midi files with python and the requests library. Unfortunately, I can't seem to actually download the midi files themselves. The only thing I'm downloading is HTML files. I have searched SO and tried some other solutions, such as this post , and this post , but both solutions didn't work for me.

Here is the code I've written:

from bs4 import BeautifulSoup
import requests
import re

url = 'http://www.midiworld.com/classic.htm'
headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
r = requests.get(url, headers=headers)
data = r.text
soup = BeautifulSoup(data, "html.parser")

links = []
for link in soup.find_all("a", href=re.compile("mid$")):
    links.append(link['href'])


def get_filename(links):
    filenames = []
    """
    Will return a list of filenames for the files to be downloaded
    """
    for link in links:
        url = link
        if url.find('/'):
            f_name = url.rsplit('/', 1)[1]
            print(url.rsplit('/', 1)[1])
            filenames.append(f_name)
    return filenames


def download_files(links, filenames):
    for link, filename in zip(links, filenames):
        r = requests.get(url, allow_redirects=True)
        with open(filename, 'wb') as saveMidi:
            saveMidi.write(r.content)

filenames = get_filename(links)
download_files(links, filenames)

I can't figure out why I'm getting html files returned. Any ideas on how to get the midi files downloaded properly?

Answer 1

I solved the issue, but I had to make some major changes to your code. Revised code:

import requests
from bs4 import BeautifulSoup
import re

main_page = requests.get('http://www.midiworld.com/classic.htm')
parsed_page = BeautifulSoup(main_page.content, 'html.parser')

links = parsed_page.find_all('a', href=re.compile('mid$'))
def getFileName(link):
    link = link['href']
    filename = link.split('/')[::-1][0]
    return filename

def downloadFile(link, filename):
    mid_file = requests.get(link['href'], stream=True)
    with open(filename, 'wb') as saveMidFile:
        saveMidFile.write(mid_file.content)
        print('Downloaded {} successfully.'.format(filename))

for link in links:
    filename = getFileName(link)
    downloadFile(link, filename)

This seemed to download the files quickly and easily. None of them are corrupted and i can play them just fine. Thanks for cluttering my home folder with classical music tho.

Answer 2

I don't know why, but this worked for me.

from urllib.request import urlopen
x = urlopen(links[0]).read()
with open(filenames[0], "wb") as f:
    f.write(x)

Answer 3

If someone wants to use the shell spell:

wget https://www.midiworld.com/mozart.htm
cat mozart.htm | grep -oh -E 'https(.*)\.mid"' | sed 's/"//' | xargs wget -c -t1
rm mozart.htm

Why can't I download a midi file with python requests?

Question

3 answers

solution1
3 ACCPTED 2018-09-01 07:06:04

solution2
1 2018-09-01 04:17:09

solution3
0 2021-03-11 19:39:45

Why can't I download a midi file with python requests?

Question

3 answers

solution1 3 ACCPTED 2018-09-01 07:06:04

solution2 1 2018-09-01 04:17:09

solution3 0 2021-03-11 19:39:45

solution1
3 ACCPTED 2018-09-01 07:06:04

solution2
1 2018-09-01 04:17:09

solution3
0 2021-03-11 19:39:45