从 URL 与 Z251D2BBFE9A3B95E56AZ91CEB30DC?6

Question

There is a.csv file contained within a.zip file from a URL I am trying to read into a Pandas DataFrame; 我不想将 .zip 文件下载到磁盘，而是直接从 URL 读取数据。 我意识到 pandas.read_csv() 只有在 .csv 文件是 .zip 中包含的唯一文件时才能执行此操作，但是，当我运行此文件时：

import pandas as pd

# specify zipped comma-separated values url
zip_csv_url = 'http://www12.statcan.gc.ca/census-recensement/2016/geo/ref/gaf/files-fichiers/2016_92-151_XBB_csv.zip'
df1 = pd.read_csv(zip_csv_url)

我明白了：

ValueError: Multiple files found in compressed zip file ['2016_92-151_XBB.csv', '92-151-g2016001-eng.pdf', '92-151-g2016001-fra.pdf']

.zip 的内容出现排列成列表； I'm wondering how I can assign the new DataFrame (df1) as the only available.csv file in the.zip (as the.zip file from the URL I will be using would only ever have one.csv file within it). 谢谢！

注意

当我运行此代码时，来自带有 shapefile 的单独 URL 的相应 .zip 文件读取 geopandas.read_file() 没有问题：

import geopandas as gpd

# specify zipped shapefile url
zip_shp_url = 'http://www12.statcan.gc.ca/census-recensement/2011/geo/bound-limit/files-fichiers/2016/ldb_000b16a_e.zip'
gdf1 = gpd.read_file(zip_shp_url)

尽管.pdf文件也包含在.zip中，如下图所示：

看起来 geopandas.read_file() 只能读取创建 GeoDataFrame 所需的 shapefile，而忽略不必要的数据文件。 由于它基于 Pandas，Pandas 不应该也具有仅读取 a.csv 中的 a.csv 的功能吗？ 有什么想法吗？

Answer 1

import zipfile
import pandas as pd
from io import BytesIO
from urllib.request import urlopen


resp = urlopen(  YOUR_ZIP_LINK  )
files_zip = zipfile.ZipFile(BytesIO(resp.read()))
# files_zip.namelist()
directory_to_extract_to = YOUR_DESTINATION_FOLDER
file = YOUR_csv_FILE_NAME
with files_zip as zip_ref:
    zip_ref.extract(file,directory_to_extract_to)
pd.read_csv(directory_to_extract_to + file)

从 URL 与 Z251D2BBFE9A3B95E56AZ91CEB30DC?6

问题描述

1 个解决方案

解决方案1
0 2021-04-26 16:31:28

从 URL 与 Z251D2BBFE9A3B95E56AZ91CEB30DC?6

问题描述

1 个解决方案

解决方案1 0 2021-04-26 16:31:28

解决方案1
0 2021-04-26 16:31:28