简体   繁体   中英

Reading multiple csv files, concatenate list of file names them into a singe DataFrame

I have multiple csv files in directory and I would loop thru to csv files find a list of files names and read each one in and concatenate them into a single data frame. In the case of a single, just read the dataset in.

Here example of csv files I have in my Directory:

  • 2013_nba.csv
  • 2014_nba.csv
  • 2015_nba.csv
  • 2013_basketball.csv
  • 2014_basketball.csv
  • 2015_soccer.csv

This is what I have so far. But this basically reads all csv files and concatenate them into a single DF. I need help one how to loop thru to find find list of strings(csv)

path = 'C:\\Users\csvfiles\\'
csvFiles = glob.glob(path + "/*.csv")

list_ = []

for files in csvFiles:
     df = pd.read_csv(files, index_col=None, header=0)
     list_.append(df)

frame = pd.concat(list_, ignore_index=True)

I am newby in python, I try to do "for "nba" in files" to pull all csv files names have "nba" in and then make one DF, but wasn't successful.

UPDATE:

a bit improved version of get_merged_csv() function which can pass through parameters to pd.read_csv() :

import os
import glob
import pandas as pd

def get_merged_csv(flist, **kwargs):
    return pd.concat([pd.read_csv(f, **kwargs) for f in flist], ignore_index=True)

path = 'C:/Users/csvfiles'
fmask = os.path.join(path, '*nba*.csv')

df = get_merged_csv(glob.glob(fmask), index_col=None, usecols=['rank', 'name'])

print(df.head())

OLD version :

import os
import glob
import pandas as pd

path = 'C:/Users/csvfiles'
#fmask = '*.csv'

def get_merged_csv(path, fmask):
    return pd.concat([pd.read_csv(f, index_col=None, header=0)
                      for f in glob.glob(os.path.join(path, fmask))]
           )

df_list = [get_merged_csv(path, fmask)
           for fmask in ['*nba.csv', '*basketball.csv', '*soccer.csv']]

df_list will have three DFs: df_list[0] - NBA, df_list[1] - basketball, df_list[1] - soccer

alternatively you can put them into a dictionary:

df_dict = {}
df_dict['nba'] = get_merged_csv(path, '*nba.csv')
df_dict['basketball'] = get_merged_csv(path, '*basketball.csv')
df_dict['soccer'] = get_merged_csv(path, '*soccer.csv')

Some explanations:

get_merged_csv(path, fmask) function reads CSV files in the list comprehension loop, this list of DFs will be passed to the pd.concat() function which will return single concatenated DF

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM