[英]Error Unziping a file - Jupyter Notebook - Python 2.x -3.x - AI Notebook -Google Cloud Platform
I am attempting to download a .zip file from https://www.fec.gov/data/browse-data/?tab=bulk-data specifically https://www.fec.gov/files/bulk-downloads/2020/indiv20.zip .我正在尝试从https://www.fec.gov/data/browse-data/?tab=bulk-data下载 .zip 文件,特别是https://www.fec.gov/files/bulk-downloads/2020 /indiv20.zip 。 Compressed, the file is 2.7 GB.
压缩后,文件为 2.7 GB。 The download is initiated and complete within 10 seconds.
下载在 10 秒内启动并完成。 When I then try to unzip the file, I receive the error messages below.
当我尝试解压缩文件时,我收到以下错误消息。 When downloaded to my local machine, the link downloads as a .zip file and opens to the data requested.
当下载到我的本地机器时,链接下载为 .zip 文件并打开到请求的数据。
!python --version
Python 3.7.8
!curl -O https://www.fec.gov/files/bulk-downloads/2020/indiv20.zip
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 138 100 138 0 0 690 0 --:--:-- --:--:-- --:--:-- 690
!unzip -a indiv20.zip
Archive: indiv20.zip End-of-central-directory signature not found. Either this file is not a zipfile, or it constitutes one disk of a multi-part archive. In the latter case the central directory and zipfile comment will be found on the last disk(s) of this archive. unzip: cannot find zipfile directory in one of indiv20.zip or indiv20.zip.zip, and cannot find indiv20.zip.ZIP, period.
import zipfile
with zipfile.ZipFile("indiv20.zip", 'r') as zip_ref:
zip_ref.extractall()
Looks like the HTTP server is returning a redirection and curl
is storing the "302 Found" message into the indiv20.zip file instead of the actual ZIP data.看起来 HTTP 服务器正在返回重定向,
curl
将“302 Found”消息存储到 indiv20.zip 文件中,而不是实际的 ZIP 数据。
You can solve this by adding the -L
(or --location ) parameter to the curl
command so it follows redirects:您可以通过将
-L
(或--location )参数添加到curl
命令来解决此问题,以便它遵循重定向:
$ curl -LO https://www.fec.gov/files/bulk-downloads/2020/indiv20.zip
Check the content of the file.检查文件的内容。 It is probably an error message in html.
这可能是 html 中的错误消息。 (cat indiv20.zip)
(cat indiv20.zip)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.