[英]Merge specific column in multiple dataframe with different length
df1
Color date
0 A 2011
1 B 201411
2 C 20151231
3 A 2019
df2
Color date
0 A 2013
1 B 20151111
2 C 201101
df3
Color date
0 A 2011
1 B 201411
2 C 20151231
3 A 2019
4 Y 20070212
假設有三個數據框:我想通過僅提取“日期”列來創建一個新的數據框。
輸出我想要的
新的df
df1-date df2-date df3-date
0 2011 2013 2011
1 201411 20151111 201411
2 20151231 201101 20151231
3 2019 NaN 2019
4 NaN NaN 20070212
我想將空部分設置為 NaN 因為長度不同。
我嘗試合並,連接但出現錯誤..
感謝您的閱讀。
這包括兩個問題,1個多數據幀merge
,2個重復鍵合並
def multikey(x):
return x.assign(key=x.groupby('Color').cumcount())
#we use groupby and cumcount create the addtional key
from functools import reduce
#then use reduce
df = reduce(lambda left,right:
pd.merge(left,right,on=['Color','key'],how='outer'),
list(map(multikey, [df1,df2,df3])))
df
Color date_x key date_y date
0 A 2011.0 0 2013.0 2011
1 B 201411.0 0 20151111.0 201411
2 C 20151231.0 0 201101.0 20151231
3 A 2019.0 1 NaN 2019
4 Y NaN 0 NaN 20070212
注意這里的名稱我們可以隨時通過rename
來修改
cancat
方法2不考慮key與index合並
s=pd.concat([df1,df2,df3],keys=['df1','df2','df3'], axis=1)
s.columns=s.columns.map('_'.join)
s=s.filter(like='_date')
s
df1_date df2_date df3_date
0 2011.0 2013.0 2011
1 201411.0 20151111.0 201411
2 20151231.0 201101.0 20151231
3 2019.0 NaN 2019
4 NaN NaN 20070212
另一種方法
df1.join(df2['date'],rsuffix='df2',how='outer').join(df3['date'],rsuffix='df3',how='outer')
輸出
Color date datedf2 datedf3
0 A 2011.0 2013.0 2011
1 B 201411.0 20151111.0 201411
2 C 20151231.0 201101.0 20151231
3 A 2019.0 NaN 2019
4 NaN NaN NaN 20070212
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.