如何將 CSV 文件從 URL 的壓縮文件夾加載到 Pandas ZBA834BA059A91A7EB88E2

Question

我想從 URL 的壓縮文件夾中將 CSV 文件加載到 Pandas ZBA834B159A9A389E45559 中。 我在這里提到並使用了相同的解決方案，如下所示：

from urllib import request
import zipfile

# link to the zip file
link = 'https://cricsheet.org/downloads/'
# the zip file is named as ipl_csv2.zip
request.urlretrieve(link, 'ipl_csv2.zip')
compressed_file = zipfile.ZipFile('ipl_csv2.zip')

# I need the csv file named all_matches.csv from ipl_csv2.zip
csv_file = compressed_file.open('all_matches.csv')
data = pd.read_csv(csv_file)
data.head()

但是運行代碼后，我收到一個錯誤：

BadZipFile                                Traceback (most recent call last)
<ipython-input-3-7b7a01259813> in <module>
      1 link = 'https://cricsheet.org/downloads/'
      2 request.urlretrieve(link, 'ipl_csv2.zip')
----> 3 compressed_file = zipfile.ZipFile('ipl_csv2.zip')
      4 csv_file = compressed_file.open('all_matches.csv')
      5 data = pd.read_csv(csv_file)

~\Anaconda3\lib\zipfile.py in __init__(self, file, mode, compression, allowZip64, compresslevel, strict_timestamps)
   1267         try:
   1268             if mode == 'r':
-> 1269                 self._RealGetContents()
   1270             elif mode in ('w', 'x'):
   1271                 # set the modified flag so central directory gets written

~\Anaconda3\lib\zipfile.py in _RealGetContents(self)
   1334             raise BadZipFile("File is not a zip file")
   1335         if not endrec:
-> 1336             raise BadZipFile("File is not a zip file")
   1337         if self.debug > 1:
   1338             print(endrec)

BadZipFile: File is not a zip file

我不太習慣 Python 中的 zip 文件處理。 所以請在這里幫助我，我需要在我的代碼中進行哪些更正？

If I open the URL https://cricsheet.org/downloads/ipl_csv2.zip in a web browser, the zip file gets automatically downloaded in my system. As data gets added daily in this zip file, I want to access the URL and directly get the CSV file via Python to save storage.

Edit1：如果你們有任何其他代碼解決方案，那么請分享...

Answer 1

嘗試這個：

link = "https://cricsheet.org/downloads/ipl_csv2.zip"

如果文件被下載，請不要擔心，如果您不想要該文件，請取消下載。 您將始終從link獲得更新的數據。

Answer 2

這是我在下面與@nobleknight 討論后所做的：

# importing libraries
import zipfile
from urllib.request import urlopen
import shutil
import os

url = 'https://cricsheet.org/downloads/ipl_csv2.zip'
file_name = 'ipl_csv2.zip'

# extracting zipfile from URL
with urlopen(url) as response, open(file_name, 'wb') as out_file:
    shutil.copyfileobj(response, out_file)

    # extracting required file from zipfile
    with zipfile.ZipFile(file_name) as zf:
        zf.extract('all_matches.csv')

# deleting the zipfile from the directory
os.remove('ipl_csv2.zip')

# loading data from the file
data = pd.read_csv('all_matches.csv')

這個解決方案可以防止我在網上找到的每個解決方案都面臨的ContentTooShortError和HTTPForbiddenError錯誤。 感謝@nobleknight 為我提供了參考this的部分解決方案。

歡迎任何其他想法。

如何將 CSV 文件從 URL 的壓縮文件夾加載到 Pandas ZBA834BA059A91A7EB88E2

問題描述

2 個解決方案

解決方案1
1 2021-04-27 14:22:12

解決方案2
1 已采納 2021-04-28 10:21:38

如何將 CSV 文件從 URL 的壓縮文件夾加載到 Pandas ZBA834BA059A91A7EB88E2

問題描述

2 個解決方案

解決方案1 1 2021-04-27 14:22:12

解決方案2 1 已采納 2021-04-28 10:21:38

解決方案1
1 2021-04-27 14:22:12

解決方案2
1 已采納 2021-04-28 10:21:38