简体   繁体   中英

Error Unziping a file - Jupyter Notebook - Python 2.x -3.x - AI Notebook -Google Cloud Platform

I am attempting to download a .zip file from https://www.fec.gov/data/browse-data/?tab=bulk-data specifically https://www.fec.gov/files/bulk-downloads/2020/indiv20.zip . Compressed, the file is 2.7 GB. The download is initiated and complete within 10 seconds. When I then try to unzip the file, I receive the error messages below. When downloaded to my local machine, the link downloads as a .zip file and opens to the data requested.

!python --version

Python 3.7.8

!curl -O https://www.fec.gov/files/bulk-downloads/2020/indiv20.zip

% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 138 100 138 0 0 690 0 --:--:-- --:--:-- --:--:-- 690

!unzip -a indiv20.zip

Archive: indiv20.zip End-of-central-directory signature not found. Either this file is not a zipfile, or it constitutes one disk of a multi-part archive. In the latter case the central directory and zipfile comment will be found on the last disk(s) of this archive. unzip: cannot find zipfile directory in one of indiv20.zip or indiv20.zip.zip, and cannot find indiv20.zip.ZIP, period.

import zipfile
with zipfile.ZipFile("indiv20.zip", 'r') as zip_ref:
    zip_ref.extractall()

在此处输入图片说明

在此处输入图片说明

Looks like the HTTP server is returning a redirection and curl is storing the "302 Found" message into the indiv20.zip file instead of the actual ZIP data.

You can solve this by adding the -L (or --location ) parameter to the curl command so it follows redirects:

$ curl -LO https://www.fec.gov/files/bulk-downloads/2020/indiv20.zip

Check the content of the file. It is probably an error message in html. (cat indiv20.zip)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM