簡體   English   中英

Python:批量下載xml文件會返回損壞的zip文件

[英]Python: downloading xml files in batch returns a damaged zip file

從這篇文章中汲取靈感,我正試圖從一個網站批量下載一堆xml文件:

import urllib2

url='http://ratings.food.gov.uk/open-data/'

f = urllib2.urlopen(url)
data = f.read()
with open("C:\Users\MyName\Desktop\data.zip", "wb") as code:
    code.write(data)

zip文件是在幾秒鍾內創建的,但是當我嘗試訪問它時,出現一個錯誤窗口:

Windows cannot open the folder.
The Compressed (zipped) Folder "C:\Users\MyName\Desktop\data.zip" is invalid.

我在這里做錯了什么?

您沒有采取任何措施將其編碼為zip文件。 相反,如果您選擇在純文本編輯器(如記事本)中打開它,它將顯示給您原始的xml。

您沒有在zip文件中打開文件句柄:

import urllib2
from bs4 import BeautifulSoup
import zipfile

url='http://ratings.food.gov.uk/open-data/'

fileurls = []

f = urllib2.urlopen(url)
mainpage = f.read()

soup = BeautifulSoup(mainpage, 'html.parser')

tablewrapper = soup.find(id='openDataStatic')

for table in tablewrapper.find_all('table'):
    for link in table.find_all('a'):
        fileurls.append(link['href'])

with zipfile.ZipFile("data.zip", "w") as code:
    for url in fileurls:
        print('Downloading: %s' % url)
        f = urllib2.urlopen(url)
        data = f.read()
        xmlfilename = url.rsplit('/', 1)[-1]
        code.writestr(xmlfilename, data)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM