简体   繁体   中英

Import csv files from folders inside zip folder

I have zip folder namely zip_file.zip, and thousands of folders are inside of it. There are also thousands of .csv files inside these folders and I want to import all csv files and concat them. I tried the solution that I found in Stackoverflow but it doesn't work. Could you please help?

import zipfile
import pandas as pd
import glob

path = zipfile.ZipFile('/zip_file.zip')
all_files = all_files = glob.glob(path + "/*.csv")
li = []
for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)

One option is to use dask which will use fsspec under the hood for complex read situations:

from dask.dataframe import read_csv

# this line will create a pandas dataframe
df = read_csv('zip://*.csv::zip_file.zip').compute()

Note that .compute call assumes that the data fits into memory. If this is not the case, you will need to think further about how you want the data to be processed.

Also, the above assumes that you have dask installed, if not, install it in the terminal/shell via pip (or conda):

pip install dask

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM