简体   繁体   English

如何动态创建和合并数据框?

[英]How to create and merge dataframes on the fly?

I have 10 csv files which are huge in size.我有 10 个大小很大的 csv 文件。 I would like to我想

1) read those files 1)读取这些文件

2) create dataframes (with filename as dataframe name) 2)创建数据框(文件名为 dataframe 名称)

3) left outer join all of them based on joining keys which are given. 3)左外连接所有这些都基于给定的连接键。 POIU and BVCX have only one common column to merge which is A . POIU and BVCX have only one common column to merge which is A Please note file 'ABCDE' is the base df.. All other dataframe should be left outer joined with this 'ABCDE' df.请注意文件 'ABCDE' 是基础 df.. 所有其他 dataframe 应与此 'ABCDE' df 保持外部连接。 It is possible that there are other common keys as well.也有可能还有其他常用键。 But I would like to join based on keys A and B whichever of these two exist但我想基于键AB加入这两个键中的任何一个

在此处输入图像描述

I was able to do the first two steps as shown below我能够完成前两个步骤,如下所示

filenames = sorted(glob.glob('*.csv'))
df_list=[]
for f in filenames:
    print(f)
    t = vars()['df'+ f = pd1.read_csv(f,low_memory=False)
    df_list.append(t)

But I am stuck on how to left outer join all this on the fly and create one final dataframe which will be named as df_final ?但是我被困在如何离开外部加入所有这些并创建一个最终的 dataframe ,它将被命名为df_final

If the joining keys are all the shared keys that exists in the files, you don't have to do anything special to change from one joined key to 2. You can merge them in the loop with:如果加入键是文件中存在的所有共享键,则无需执行任何特殊操作即可从一个加入键更改为 2。您可以在循环中将它们合并为:

for f in filenames:

    # YOUR CODE WITH WHATEVER YOU DO WITH IT
    print(f)
    t = vars()['df'+ f = pd1.read_csv(f,low_memory=False)
    df_list.append(t)

    # THE CODE FOR MERGING THE DFS
    cur_df = pd1.read_csv(f, low_memory=False)
    try:
        JKeys = ['A', 'B'] if 'B' in cur_df.columns else ['A']:
        df_final = df_final.merge(cur_df, right_on=JKeys, how='left')
    except:
        df_final = cur_df

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM