[英]Pandas left outer join multiple dataframes on multiple columns
我是使用DataFrame的新手,我想知道如何在一系列表的多個列上執行等效於左外部聯接的SQL
例:
df1:
Year Week Colour Val1
2014 A Red 50
2014 B Red 60
2014 B Black 70
2014 C Red 10
2014 D Green 20
df2:
Year Week Colour Val2
2014 A Black 30
2014 B Black 100
2014 C Green 50
2014 C Red 20
2014 D Red 40
df3:
Year Week Colour Val3
2013 B Red 60
2013 C Black 80
2013 B Black 10
2013 D Green 20
2013 D Red 50
本質上,我想執行類似以下SQL代碼的操作(注意df3在Year上未加入):
SELECT df1.*, df2.Val2, df3.Val3
FROM df1
LEFT OUTER JOIN df2
ON df1.Year = df2.Year
AND df1.Week = df2.Week
AND df1.Colour = df2.Colour
LEFT OUTER JOIN df3
ON df1.Week = df3.Week
AND df1.Colour = df3.Colour
結果應如下所示:
Year Week Colour Val1 Val2 Val3
2014 A Red 50 Null Null
2014 B Red 60 Null 60
2014 B Black 70 100 Null
2014 C Red 10 20 Null
2014 D Green 20 Null Null
我曾嘗試使用合並和聯接,但無法弄清楚如何在多個表上以及涉及多個聯接時執行此操作。 有人可以幫我嗎?
謝謝
將它們合並為兩個步驟,首先將df1
和df2
合並,然后將其結果合並到df3
。
In [33]: s1 = pd.merge(df1, df2, how='left', on=['Year', 'Week', 'Colour'])
我從df3刪除了year,因為您上次加入不需要它。
In [39]: df = pd.merge(s1, df3[['Week', 'Colour', 'Val3']],
how='left', on=['Week', 'Colour'])
In [40]: df
Out[40]:
Year Week Colour Val1 Val2 Val3
0 2014 A Red 50 NaN NaN
1 2014 B Red 60 NaN 60
2 2014 B Black 70 100 10
3 2014 C Red 10 20 NaN
4 2014 D Green 20 NaN 20
[5 rows x 6 columns]
也可以使用@TomAugspurger的答案的緊湊版本來做到這一點,如下所示:
df = df1.merge(df2, how='left', on=['Year', 'Week', 'Colour']).merge(df3[['Week', 'Colour', 'Val3']], how='left', on=['Week', 'Colour'])
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.