简体   繁体   中英

General Script to Concatenate Series of DataFrames

I have a dictionary of over 1000 dataframes (df_dict) where some elements may be series:

Schedule_RCL_03312007

Schedule_RCL_03312008

Schedule_RCL_03312009

Schedule_RCL_03312010(1_of_2)

Schedule_RCL_03312010(2_of_2)

Schedule_RCL_03312011(1_of_2)

Schedule_RCL_03312011(2_of_2)

Schedule_RCL_06302011(1_of_2)

Schedule_RCL_06302011(2_of_2)

How would I selectively horizontally concatenate the serial DataFrames (eg (1 of 2) and (2 of 2)) in a general fashion? I know how to perform the function for the specific situation but I need something general for when the data may change over time.

You could use itertools.groupby :

from itertools import groupby
from operator import itemgetter
df_names = ['Schedule_RCL_03312007', 'Schedule_RCL_03312008', 'Schedule_RCL_03312009', 'Schedule_RCL_03312010(1_of_2)', 'Schedule_RCL_03312010(2_of_2)', 'Schedule_RCL_03312011(1_of_2)', 'Schedule_RCL_03312011(2_of_2)', 'Schedule_RCL_06302011(1_of_2)', 'Schedule_RCL_06302011(2_of_2)']

for key, grp in groupby(sorted([d.split('(') for d in df_names]), key=itemgetter(0)): # use df_dict.keys() instead of df_names
    frames = ['('.join(f) for f in grp]
    if len(frames) > 1:
        print(key, frames)
        df_dict[key] = pd.concat([df_dict[f] for f in frames], axis=1)
        [df_dict.pop(key, None) for key in frames] # optional: remove serial frames 

which prints:

Schedule_RCL_03312010 ['Schedule_RCL_03312010(1_of_2)', 'Schedule_RCL_03312010(2_of_2)']
Schedule_RCL_03312011 ['Schedule_RCL_03312011(1_of_2)', 'Schedule_RCL_03312011(2_of_2)']
Schedule_RCL_06302011 ['Schedule_RCL_06302011(1_of_2)', 'Schedule_RCL_06302011(2_of_2)']

Since you are using a dict , you could use df_dict.keys() instead.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM