[英]Merging Pandas DataFrames with keys in different columns
I'm trying to merge two Pandas DataFrames which are as follows:我正在尝试合并两个 Pandas DataFrames,如下所示:
import pandas as pd
df1 = pd.DataFrame({'PAIR': ['140-120', '200-280', '350-310', '410-480', '500-570'],
'SCORE': [99, 70, 14, 84, 50]})
print(df1)
PAIR SCORE
0 140-120 99
1 200-280 70
2 350-310 14
3 410-480 84
4 500-570 50
df2 = pd.DataFrame({'PAIR1': ['140-120', '280-200', '350-310', '480-410', '500-570'],
'PAIR2': ['120-140', '200-280', '310-350', '410-480', '570-500'],
'BRAND' : ['A', 'V', 'P', 'V', 'P']})
print(df2)
PAIR1 PAIR2 BRAND
0 140-120 120-140 A
1 280-200 200-280 V
2 350-310 310-350 P
3 480-410 410-480 V
4 500-570 570-500 P
If you take a closer look, you will notice that each value in the PAIR
column of df1
match either the value in PAIR1
or PAIR2
of df2
.如果仔细观察,您会注意到
df1
的PAIR
列中的每个值PAIR2
与df2
PAIR1
或PAIR2
中的值匹配。 In df2
, the keys are present in both ways (eg 140-120 and 120-140) .在
df2
,密钥以两种方式存在(例如140-120和120-140) 。
My goal is to merge the two DataFrames to obtain the following result:我的目标是合并两个 DataFrame 以获得以下结果:
PAIR SCORE BRAND
0 140-120 99 A
1 200-280 70 V
2 350-310 14 P
3 410-480 84 V
4 500-570 50 P
I tried to first merge df1
with df2
the following way:我尝试通过以下方式首先将
df1
与df2
合并:
df3 = pd.merge(left = df1, right = df2, how = 'left', left_on = 'PAIR', right_on = 'PAIR1')
Then, taking the resulting DataFrame df3
and merge it back with df2
:然后,获取生成的 DataFrame
df3
并将其与df2
合并:
df4 = pd.merge(left = df3, right = df2, how = 'left', left_on = 'PAIR', right_on = 'PAIR2')
print(df4)
PAIR SCORE PAIR1_x PAIR2_x BRAND_x PAIR1_y PAIR2_y BRAND_y
0 140-120 99 140-120 120-140 A NaN NaN NaN
1 200-280 70 NaN NaN NaN 280-200 200-280 V
2 350-310 14 350-310 310-350 P NaN NaN NaN
3 410-480 84 NaN NaN NaN 480-410 410-480 V
4 500-570 50 500-570 570-500 P NaN NaN NaN
This is not my desired result.这不是我想要的结果。 I don't how else I can account for the fact that the correct key might be either in
PAIR1
or PAIR2
.我不知道我还能怎么解释正确的密钥可能在
PAIR1
或PAIR2
。 Any help would be appreciated.任何帮助,将不胜感激。
Somewhat clumsy solution: build a Series that maps each pair in df2
to its corresponding brand, then pass this mapping to df1['PAIR'].map()
.有点笨拙的解决方案:构建一个系列,将
df2
每一对映射到其对应的品牌,然后将此映射传递给df1['PAIR'].map()
。
# Build a series whose index maps pairs to values
mapper = df2.melt(id_vars='BRAND').set_index('value')['BRAND']
mapper
value
140-120 A
280-200 V
350-310 P
480-410 V
500-570 P
120-140 A
200-280 V
310-350 P
410-480 V
570-500 P
Name: BRAND, dtype: object
# Use the mapper on df1['PAIR']
df1['BRAND'] = df1['PAIR'].map(mapper)
df1
PAIR SCORE BRAND
0 140-120 99 A
1 200-280 70 V
2 350-310 14 P
3 410-480 84 V
4 500-570 50 P
temp_df1 = df2[['PAIR1', 'BRAND']]
temp_df2 = df2[['PAIR2', 'BRAND']]
temp_df2.rename(columns= {'PAIR2' : 'PAIR1'}, inplace= True)
big_df = pd.concat([temp_df1, temp_df2])
pd.merge(df1, big_df, how = 'left', left_on = 'PAIR', right_on = 'PAIR1')
You are trying to suceesively, merge on column pairs PAIR
and PAIR1
& PAIR
and PAIR2
both times maintaining the argument how='left'
which is creating all the NaN
values.您正在尝试成功地合并列对
PAIR
和PAIR1
& PAIR
和PAIR2
同时维护创建所有NaN
值的参数how='left'
。
Take a look atPandas Merging 101 .看看Pandas 合并 101 。
For your current implementation you need to take subset of the current result and remove the NaN
's.对于您当前的实现,您需要获取当前结果的子集并删除
NaN
。
A much simpler solution would be to manipulate the PAIR
in df1
so that it matches the pattern (large-small) or (small-large) in either of PAIR1
or PAIR2
一个更简单的解决方案是操作
df1
的PAIR
,使其匹配PAIR1
或PAIR2
中的模式(大-小)或(小-大)
# for working with PAIR2
df1['FOR_MERGE'] = df1['PAIR'].map(lambda x: '-'.join([str(_) for _ in sorted(x.split('-'))])).values
df2['FOR_MERGE'] = df2['PAIR1'].map(lambda x: '-'.join([str(_) for _ in sorted(x.split('-'))])).values
pd.merge(df1[['FOR_MERGE', 'SCORE']], df2[['FOR_MERGE', 'BRAND']], how='left')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.