[英]Dataframe Left Merge with Matching Keys In Different Columns
I have to merge two DataFrame
, with a Left Join - Illustration below.我必须合并两个
DataFrame
,一个 Left Join - 下图。
Problem is that the matching key is spread across 3 columns.问题是匹配键分布在 3 列中。 To further complicate the challenge, some rows (#4) will have the same matching key twice!
为了进一步复杂化挑战,某些行(#4)将有两次相同的匹配键! I was advised to use
Melt
but it only work for a Right Join.有人建议我使用
Melt
但它只适用于 Right Join。
What is the best approach?最好的方法是什么?
import pandas as pd将熊猫导入为 pd
data1 = {'key1' : ['abc','aa','aa','sdf'],
'key2' : ['aa','efg','aa', 'sdf'],
'key3' : ['aa','aa','xyz', 'aa']
}
data2 = {'key': ['abc','efg', 'xyz', 'sdf'],
'msg' : ['happy','mad','smile','great']}
df1= pd.DataFrame(data1)
df2= pd.DataFrame(data2)
Let's try stack
to reshape df1
then map
the keys with the corresponding msg
from df2
, finally groupby
on level=0
and aggregate using first
:让我们尝试使用
stack
来重塑df1
然后使用来自df2
的相应msg
map
键,最后在level=0
上groupby
并使用first
聚合:
df1['msg'] = df1.stack().map(df2.set_index('key')['msg']).groupby(level=0).first()
key1 key2 key3 msg
0 abc aa aa happy
1 aa efg aa mad
2 aa aa xyz smile
3 sdf sdf aa great
How about this ?这个怎么样 ? You can recreate a temporary dataframe where all keys are on the same columns, make your join, then drop any duplicates (and re-merge to your first dataframe) :
您可以重新创建一个临时数据框,其中所有键都在同一列上,进行连接,然后删除任何重复项(并重新合并到您的第一个数据框):
df1.reset_index(drop=True, inplace=True)
df3 = pd.DataFrame(
df1[["index", "key1"]].values.tolist()
+ df1[["index", "key2"]].values.tolist()
+ df1[["index", "key3"]].values.tolist(),
columns=['index', 'key'])
df4 = df3.merge(df2, on="key", how="left")
df4.sort_values('index', inplace=True)
df4.drop_duplicates('index', keep='first')
df = df1.merge(df4[['index', 'msg']], on="index", how='left')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.