简体   繁体   English

解压缩文件时出错 - Jupyter Notebook - Python 2.x -3.x - AI Notebook -Google Cloud Platform

[英]Error Unziping a file - Jupyter Notebook - Python 2.x -3.x - AI Notebook -Google Cloud Platform

I am attempting to download a .zip file from https://www.fec.gov/data/browse-data/?tab=bulk-data specifically https://www.fec.gov/files/bulk-downloads/2020/indiv20.zip .我正在尝试从https://www.fec.gov/data/browse-data/?tab=bulk-data下载 .zip 文件,特别是https://www.fec.gov/files/bulk-downloads/2020 /indiv20.zip Compressed, the file is 2.7 GB.压缩后,文件为 2.7 GB。 The download is initiated and complete within 10 seconds.下载在 10 秒内启动并完成。 When I then try to unzip the file, I receive the error messages below.当我尝试解压缩文件时,我收到以下错误消息。 When downloaded to my local machine, the link downloads as a .zip file and opens to the data requested.当下载到我的本地机器时,链接下载为 .zip 文件并打开到请求的数据。

!python --version

Python 3.7.8

!curl -O https://www.fec.gov/files/bulk-downloads/2020/indiv20.zip

% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 138 100 138 0 0 690 0 --:--:-- --:--:-- --:--:-- 690

!unzip -a indiv20.zip

Archive: indiv20.zip End-of-central-directory signature not found. Either this file is not a zipfile, or it constitutes one disk of a multi-part archive. In the latter case the central directory and zipfile comment will be found on the last disk(s) of this archive. unzip: cannot find zipfile directory in one of indiv20.zip or indiv20.zip.zip, and cannot find indiv20.zip.ZIP, period.

import zipfile
with zipfile.ZipFile("indiv20.zip", 'r') as zip_ref:
    zip_ref.extractall()

在此处输入图片说明

在此处输入图片说明

Looks like the HTTP server is returning a redirection and curl is storing the "302 Found" message into the indiv20.zip file instead of the actual ZIP data.看起来 HTTP 服务器正在返回重定向, curl将“302 Found”消息存储到 indiv20.zip 文件中,而不是实际的 ZIP 数据。

You can solve this by adding the -L (or --location ) parameter to the curl command so it follows redirects:您可以通过将-L (或--location )参数添加到curl命令来解决此问题,以便它遵循重定向:

$ curl -LO https://www.fec.gov/files/bulk-downloads/2020/indiv20.zip

Check the content of the file.检查文件的内容。 It is probably an error message in html.这可能是 html 中的错误消息。 (cat indiv20.zip) (cat indiv20.zip)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在 IPython Notebook 中同时使用 Python 2.x 和 Python 3.x - Using both Python 2.x and Python 3.x in IPython Notebook 如何从 Google Cloud AI Jupyter Notebook(Python) 连接到 Google Cloud Platform Data Storage? - How to connect to Google Cloud Platform Data Storage from Google Cloud AI Jupyter Notebook(Python)? Anaconda在IPython Notebook中同时使用Python 2.x和Python 3.x - Anaconda using Using both Python 2.x and Python 3.x in IPython Notebook 在 Google Cloud Platform 中使用 jupyter notebooks 的 Python 模块和包 - Python Modules & Packages using jupyter notebook in Google Cloud Platform Google Cloud Platform AI Notebook - 如何确保正在使用 GPU? - Google Cloud Platform AI Notebook - how to ensure GPU is being used? AI Platform Notebook 始终使用 Python 2 - AI Platform Notebook Always Python 2 如何配置虚拟机以在 Google Cloud Platform 上通过 Web 访问 jupyter notebook? - How to configure a VM for web access to jupyter notebook on Google Cloud Platform? 处理Python 2.x和3.x中的缓冲区 - Deal with buffer in Python 2.x and 3.x 在 Google Colab 平台上的 Jupyter Notebook 中显示/渲染 HTML 文件 - Display / Render an HTML file inside Jupyter Notebook on Google Colab platform Python readlines() 3.X 到 2.X - Python readlines() 3.X to 2.X
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM