[英]Merge specific column in multiple dataframe with different length
df1
Color date
0 A 2011
1 B 201411
2 C 20151231
3 A 2019
df2
Color date
0 A 2013
1 B 20151111
2 C 201101
df3
Color date
0 A 2011
1 B 201411
2 C 20151231
3 A 2019
4 Y 20070212
假设有三个数据框:我想通过仅提取“日期”列来创建一个新的数据框。
输出我想要的
新的df
df1-date df2-date df3-date
0 2011 2013 2011
1 201411 20151111 201411
2 20151231 201101 20151231
3 2019 NaN 2019
4 NaN NaN 20070212
我想将空部分设置为 NaN 因为长度不同。
我尝试合并,连接但出现错误..
感谢您的阅读。
这包括两个问题,1个多数据帧merge
,2个重复键合并
def multikey(x):
return x.assign(key=x.groupby('Color').cumcount())
#we use groupby and cumcount create the addtional key
from functools import reduce
#then use reduce
df = reduce(lambda left,right:
pd.merge(left,right,on=['Color','key'],how='outer'),
list(map(multikey, [df1,df2,df3])))
df
Color date_x key date_y date
0 A 2011.0 0 2013.0 2011
1 B 201411.0 0 20151111.0 201411
2 C 20151231.0 0 201101.0 20151231
3 A 2019.0 1 NaN 2019
4 Y NaN 0 NaN 20070212
注意这里的名称我们可以随时通过rename
来修改
cancat
方法2不考虑key与index合并
s=pd.concat([df1,df2,df3],keys=['df1','df2','df3'], axis=1)
s.columns=s.columns.map('_'.join)
s=s.filter(like='_date')
s
df1_date df2_date df3_date
0 2011.0 2013.0 2011
1 201411.0 20151111.0 201411
2 20151231.0 201101.0 20151231
3 2019.0 NaN 2019
4 NaN NaN 20070212
另一种方法
df1.join(df2['date'],rsuffix='df2',how='outer').join(df3['date'],rsuffix='df3',how='outer')
输出
Color date datedf2 datedf3
0 A 2011.0 2013.0 2011
1 B 201411.0 20151111.0 201411
2 C 20151231.0 201101.0 20151231
3 A 2019.0 NaN 2019
4 NaN NaN NaN 20070212
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.