[英]Import csv files from folders inside zip folder
I have zip folder namely zip_file.zip, and thousands of folders are inside of it.我有 zip 文件夹,即 zip_file.zip,其中有数千个文件夹。 There are also thousands of .csv files inside these folders and I want to import all csv files and concat them.这些文件夹中还有数千个 .csv 文件,我想导入所有 csv 文件并将它们连接起来。 I tried the solution that I found in Stackoverflow but it doesn't work.我尝试了在 Stackoverflow 中找到的解决方案,但它不起作用。 Could you please help?能否请你帮忙?
import zipfile
import pandas as pd
import glob
path = zipfile.ZipFile('/zip_file.zip')
all_files = all_files = glob.glob(path + "/*.csv")
li = []
for filename in all_files:
df = pd.read_csv(filename, index_col=None, header=0)
li.append(df)
frame = pd.concat(li, axis=0, ignore_index=True)
One option is to use dask
which will use fsspec
under the hood for complex read situations:一种选择是使用dask
,它将在后台使用fsspec
来处理复杂的读取情况:
from dask.dataframe import read_csv
# this line will create a pandas dataframe
df = read_csv('zip://*.csv::zip_file.zip').compute()
Note that .compute
call assumes that the data fits into memory.请注意, .compute
调用假定数据适合内存。 If this is not the case, you will need to think further about how you want the data to be processed.如果不是这种情况,您将需要进一步考虑您希望如何处理数据。
Also, the above assumes that you have dask installed, if not, install it in the terminal/shell via pip (or conda):此外,以上假设您已经安装了 dask,如果没有,请通过 pip(或 conda)将其安装在终端/shell 中:
pip install dask
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.