简体   繁体   中英

Concatenate multiple csv files from different folders into one csv file in python

I am trying to concatenate multiple csv files into one file(about 30 files). All csv files are located in different folders.

However, I have encountered an error while appending all files together: OSError: Initializing from file failed

Here is my code:

import pandas
import glob
 
path = 'xxx'
target_folders=['Apples', 'Oranges', 'Bananas','Raspberry','Strawberry', 'Blackberry','Gooseberry','Liche']
output ='yyy'
path_list = []
for idx in target_folders:
    lst_of_files = glob.glob(path + idx +'\\*.csv')
    latest_files = max(lst_of_files, key=os.path.getmtime)
    path_list.append(latest_files)
    df_list = [] 
    for file in path_list: 
        df = pd.read_csv(file) 
        df_list.append(df) 
    final_df = df.append(df for df in df_list) 
    combined_csv = pd.concat([pd.read_csv(f) for f in latest_files])

    combined_csv.to_csv(output + "combined_csv.csv", index=False)

    OSError                                   Traceback (most recent call last)
    <ipython-input-126-677d09511b64> in <module>
  1 df_list = []
  2 for file in latest_files:
  ----> 3     df = pd.read_csv(file)
  4     df_list.append(df)
  5 final_df = df.append(df for df in df_list)

    OSError: Initializing from file failed


    

Without seeing your CSV file it's hard to be sure, but I've come across this problem before with unusually formatted CSVs. The CSV parser may be having difficulty in determine the structure of the CSV files, separators etc.

Try df = pd.read_csv(file, engine = 'python')

From the docs : "The C engine is faster while the python engine is currently more feature-complete."

Try passing the engine = 'python' argument on reading a single CSV file and see if you get a successful read. That way you can narrow down the problem to either file reads or traversing the files.

Try to simplify your code:

import pandas as pd
import pathlib

data_dir = 'xxx'
out_dir = 'yyy'

data = []
for filename in pathlib.Path(data_dir).glob('**/*.csv'):
    df = pd.read_csv(filename)
    data.append(df)

df = pd.concat(df, ignore_index=True)
df.to_csv(pathlib.Path('out_dir') / 'combined_csv.csv', index=False)

This solution should work as a charm to you:

import pandas as pd
import pathlib

data_dir = '/Users/thomasbryan/projetos/blocklist/files/'
out_dir = '.'

list_files = []
for filename in pathlib.Path(data_dir).glob('**/*.csv'):
    list_files.append(filename)

df = pd.concat(map(pd.read_csv, list_files), ignore_index=True)
df.to_csv(pathlib.Path(out_dir) / 'combined_csv.csv', index=False)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM