简体   繁体   中英

How to read and manipulate multiple CSV files using pandas and for-loop?

I want to read a list of CSV files, for example exon_kipan.00001.csv, exon_kipan.00002.csv, exon_kipan.00003.csv, and exon_kipan.00004.csv (24 files in total), and then perform a series of operations using pandas before concatenating the dataframes.

For a single file, I would do:

df= pd.read_csv("exon_kipan.csv", sep="\t", index_col=0, low_memory=False)
df= df[df.columns[::3]]
df= df.T 
del df[df.columns[0]]
df.index = df.index.str.upper()
df= df.sort_index()
df.index = ['-'.join( s.split('-')[:4]) for s in df.index.tolist() ]
df.rename_axis(None, axis=1, inplace=True)

However, now I want to read, manipulate, and concatenate multiple files.

filename = '/work/exon_kipan.{}.csv'
df_dict = {}
exon_clin_list = []
for i in range(1, 25):
    df_dict[i] = pd.read_csv(filename, sep="\t", index_col=0, low_memory=False)
    df_dict[i] = df_dict[i][df_dict[i].columns[::3]]
    df_dict[i] = df_dict[i].T
    del df_dict[i][df_dict[i].columns[0]]
    df_dict[i].index = df_dict[i].index.str.upper()
    df_dict[i] = df_dict[i].sort_index()
    df_dict[i].index = ['-'.join( s.split('-')[:4]) for s in df_dict[i].index.tolist() ]
    df_dict[i].rename_axis(None, axis=1, inplace=True)

    exon_clin_list.append(df_dict[i])

exon_clin = pd.concat(df_list)

My code raised:

FileNotFoundError: [Errno 2] No such file or directory: '/work/exon_kipan.{}.csv'

You have to use format method of str :

filename = '/work/exon_kipan.{:05}.csv'  # <- don't forget to modify here
...
for i in range(1, 25):
    df_dict[i] = pd.read_csv(filename.format(i), ...)

Test:

filename = '/work/exon_kipan.{:05}.csv'
for i in range(1, 25):
    print(filename.format(i))

# Output
/work/exon_kipan.00001.csv
/work/exon_kipan.00002.csv
/work/exon_kipan.00003.csv
/work/exon_kipan.00004.csv
/work/exon_kipan.00005.csv
/work/exon_kipan.00006.csv
/work/exon_kipan.00007.csv
/work/exon_kipan.00008.csv
/work/exon_kipan.00009.csv
/work/exon_kipan.00010.csv
/work/exon_kipan.00011.csv
/work/exon_kipan.00012.csv
/work/exon_kipan.00013.csv
/work/exon_kipan.00014.csv
/work/exon_kipan.00015.csv
/work/exon_kipan.00016.csv
/work/exon_kipan.00017.csv
/work/exon_kipan.00018.csv
/work/exon_kipan.00019.csv
/work/exon_kipan.00020.csv
/work/exon_kipan.00021.csv
/work/exon_kipan.00022.csv
/work/exon_kipan.00023.csv
/work/exon_kipan.00024.csv

may be something like this will work

#write a function to read file do some processing and return a dataframe
def read_file_and_do_some_actions(filename):
    df = pd.read_csv(filename, index_col=None, header=0)
    #############################
    #do some processing
    #############################
    return df


path = r'/home/tester/inputdata/exon_kipan'
all_files = glob.glob(os.path.join(path, "/work/exon_kipan.*.csv"))


#for each file in all_files list, call function read_file_and_do_some_actions and then concatenate all the dataframes into one dataframe
df = pd.concat((read_file_and_do_some_actions(f) for f in all_files), ignore_index=True)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM