简体   繁体   中英

Why can't I play the MIDI files I have downloaded programmatically, but I can play them when I download them manually?

I want to download the MIDI files from this website for a project. I have written the following code to download the files:

from bs4 import BeautifulSoup
import requests
import re, os
import urllib.request
import string

base_url = "http://www.midiworld.com/files/"

base_path = 'path/where/I/will/save/the/downloaded/MIDI/files'
os.chdir(base_path + '/MIDI Files')

for i in range(1,2386):
    page = requests.get(base_url + str(i))
    soup = BeautifulSoup(page.text, "html.parser")

    li_box = soup.select("div ul li a")
    urllib.request.urlretrieve(base_url+str(i), str(i)+'.mid')

This is downloading the files, but when I click on them to play, they don't play; I get this error:

在此输入图像描述

But if I download the files manually (I checked for a couple of them), I can play the files. In case its relevant, those files also have different names, not numbers like how I am saving them. Could it be the cause for this? The files are not empty too, as can be seen from this screenshot below:

在此输入图像描述

EDIT: When I tried to load a programmatically downloaded MIDI file to compare it to its corresponding manually downloaded MIDI file in this website, I got this error:

Failed to load data=error

But no such error when loading the manually downloaded one.

EDIT 2: These are the first 50 bytes of the hex dump:

For the programmatically downloaded file:

file name: 1.mid
mime type: 

0000-0010:  3c 21 44 4f-43 54 59 50-45 20 68 74-6d 6c 20 50  <!DOCTYP E.html.P
0000-0020:  55 42 4c 49-43 20 22 2d-2f 2f 57 33-43 2f 2f 44  UBLIC."- //W3C//D
0000-0030:  54 44 20 58-48 54 4d 4c-20 31 2e 30-20 53 74 72  TD.XHTML .1.0.Str
0000-0032:  69 63

For the corresponding manually downloaded file:

file name: Adson_John_-_Courtly_Masquing_Ayres.mid
mime type: 

0000-0010:  4d 54 68 64-00 00 00 06-00 01 00 0b-00 f0 4d 54  MThd.... ......MT
0000-0020:  72 6b 00 00-00 7b 00 ff-58 04 04 02-18 08 00 ff  rk...{.. X.......
0000-0030:  59 02 00 00-00 ff 51 03-07 a1 20 f0-40 ff 51 03  Y.....Q. ....@.Q.
0000-0032:  09 27

Your code works fine, just change base_url to

base_url = "http://www.midiworld.com/download/"

Right now, ie "1.mid" contains the HTML for this site: http://www.midiworld.com/files/1 (You can open it with a text editor.)

The MIDI-files can be downloaded the url http://www.midiworld.com/download/ {insert number}

I downloaded the first 100 but it seems there are currently 4992 downloadable midi files, so if you want more files, just change

for i in range(1,4992):

As a side-note, the site gives you download "_-_.mid" which is 0 bytes, if the requested .mid doesn't exist. So, if you are going to repeat downloading the files and you want all the files they have, consider setting range to for example 100 000 and break the loop if downloaded file-size is 0 bytes.

for i in range(1,100000):
    if (urllib.request.urlopen(base_url+str(i)).length == 0):
        break

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM