简体   繁体   English

合并具有不同长度的多个数据框中的特定列

[英]Merge specific column in multiple dataframe with different length

df1 df1

    Color   date
0   A       2011
1   B       201411
2   C       20151231
3   A       2019

df2 df2

    Color   date
0   A       2013
1   B       20151111
2   C       201101

df3 df3

    Color   date
0   A       2011
1   B       201411
2   C       20151231
3   A       2019
4   Y       20070212

Assuming there are three dataframes: I want to create a new dataframe by extracting only the 'date' column.假设有三个数据框:我想通过仅提取“日期”列来创建一个新的数据框。

output what I want输出我想要的

New df新的df

    df1-date  df2-date  df3-date     
0   2011      2013      2011
1   201411    20151111  201411
2   20151231  201101    20151231
3   2019      NaN       2019
4   NaN       NaN       20070212

I want to set the empty part to NaN because the length is different.我想将空部分设置为 NaN 因为长度不同。

I try merge,concat but getting error..我尝试合并,连接但出现错误..

Thank you for reading.感谢您的阅读。

This include two problem, 1 multiple dataframes merge , 2 duplicated key merge这包括两个问题,1个多数据帧merge ,2个重复键合并

def multikey(x): 
    return x.assign(key=x.groupby('Color').cumcount())

#we use groupby and cumcount create the addtional key

from functools import reduce

#then use reduce

df = reduce(lambda left,right: 
            pd.merge(left,right,on=['Color','key'],how='outer'), 
            list(map(multikey, [df1,df2,df3])))
df
  Color      date_x  key      date_y      date
0     A      2011.0    0      2013.0      2011
1     B    201411.0    0  20151111.0    201411
2     C  20151231.0    0    201101.0  20151231
3     A      2019.0    1         NaN      2019
4     Y         NaN    0         NaN  20070212

Notice name here we can always modify by rename注意这里的名称我们可以随时通过rename来修改

Method 2 from cancat not consider the key one merge with index cancat方法2不考虑key与index合并

s=pd.concat([df1,df2,df3],keys=['df1','df2','df3'], axis=1)
s.columns=s.columns.map('_'.join)
s=s.filter(like='_date')
s
     df1_date    df2_date  df3_date
0      2011.0      2013.0      2011
1    201411.0  20151111.0    201411
2  20151231.0    201101.0  20151231
3      2019.0         NaN      2019
4         NaN         NaN  20070212

One more approach另一种方法

df1.join(df2['date'],rsuffix='df2',how='outer').join(df3['date'],rsuffix='df3',how='outer')

Output输出

  Color     date        datedf2     datedf3
0   A       2011.0      2013.0      2011
1   B       201411.0    20151111.0  201411
2   C       20151231.0  201101.0    20151231
3   A       2019.0      NaN         2019
4   NaN     NaN         NaN         20070212

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM