繁体   English   中英

Python:批量下载xml文件会返回损坏的zip文件

[英]Python: downloading xml files in batch returns a damaged zip file

从这篇文章中汲取灵感,我正试图从一个网站批量下载一堆xml文件:

import urllib2

url='http://ratings.food.gov.uk/open-data/'

f = urllib2.urlopen(url)
data = f.read()
with open("C:\Users\MyName\Desktop\data.zip", "wb") as code:
    code.write(data)

zip文件是在几秒钟内创建的,但是当我尝试访问它时,出现一个错误窗口:

Windows cannot open the folder.
The Compressed (zipped) Folder "C:\Users\MyName\Desktop\data.zip" is invalid.

我在这里做错了什么?

您没有采取任何措施将其编码为zip文件。 相反,如果您选择在纯文本编辑器(如记事本)中打开它,它将显示给您原始的xml。

您没有在zip文件中打开文件句柄:

import urllib2
from bs4 import BeautifulSoup
import zipfile

url='http://ratings.food.gov.uk/open-data/'

fileurls = []

f = urllib2.urlopen(url)
mainpage = f.read()

soup = BeautifulSoup(mainpage, 'html.parser')

tablewrapper = soup.find(id='openDataStatic')

for table in tablewrapper.find_all('table'):
    for link in table.find_all('a'):
        fileurls.append(link['href'])

with zipfile.ZipFile("data.zip", "w") as code:
    for url in fileurls:
        print('Downloading: %s' % url)
        f = urllib2.urlopen(url)
        data = f.read()
        xmlfilename = url.rsplit('/', 1)[-1]
        code.writestr(xmlfilename, data)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM