简体   繁体   中英

How to read all csv files in multiple zip files?

I have a folder with many zip files and within those zip files are multiple csv files. Is there any way to get all of the .csv files in one dataframe in python? Or any way I can pass a list of zip files?

The code I am currently trying is:

import glob
import zipfile
import pandas as pd

for zip_file in glob.glob(r"C:\Users\harsh\Desktop\Temp\data_00-01.zip"):
    # This is just one file. There are multiple zip files in the folder
    zf = zipfile.ZipFile(zip_file)
    dfs = [pd.read_csv(zf.open(f), header=None, sep=";", encoding='latin1') for f in zf.namelist()]
    df = pd.concat(dfs,ignore_index=True)
    print(df)

This code works for one zipfile but I have about 50 zip files in the folder and I would like to read and concatenate all csv files in those zip files in one dataframe.

Thanks

The following code should satisfy your requirements (just edit dir_name according to what you need):

import glob
import zipfile
import pandas as pd

dfs = []
for filename in os.listdir(dir_name):
    if filename.endswith('.zip'):
        zip_file = os.path.join(dir_name, filename)
        zf = zipfile.ZipFile(zip_file)
        dfs += [pd.read_csv(zf.open(f), header=None, sep=";", encoding='latin1') for f in zf.namelist()]
df = pd.concat(dfs,ignore_index=True)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM