I have zip folder namely zip_file.zip, and thousands of folders are inside of it. There are also thousands of .csv files inside these folders and I want to import all csv files and concat them. I tried the solution that I found in Stackoverflow but it doesn't work. Could you please help?
import zipfile
import pandas as pd
import glob
path = zipfile.ZipFile('/zip_file.zip')
all_files = all_files = glob.glob(path + "/*.csv")
li = []
for filename in all_files:
df = pd.read_csv(filename, index_col=None, header=0)
li.append(df)
frame = pd.concat(li, axis=0, ignore_index=True)
One option is to use dask
which will use fsspec
under the hood for complex read situations:
from dask.dataframe import read_csv
# this line will create a pandas dataframe
df = read_csv('zip://*.csv::zip_file.zip').compute()
Note that .compute
call assumes that the data fits into memory. If this is not the case, you will need to think further about how you want the data to be processed.
Also, the above assumes that you have dask installed, if not, install it in the terminal/shell via pip (or conda):
pip install dask
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.