简体   繁体   English

将 Pandas DataFrames 与不同列中的键合并

[英]Merging Pandas DataFrames with keys in different columns

I'm trying to merge two Pandas DataFrames which are as follows:我正在尝试合并两个 Pandas DataFrames,如下所示:

import pandas as pd

df1 = pd.DataFrame({'PAIR': ['140-120', '200-280', '350-310', '410-480', '500-570'],
                    'SCORE': [99, 70, 14, 84, 50]})
print(df1)

      PAIR  SCORE
0  140-120     99
1  200-280     70
2  350-310     14
3  410-480     84
4  500-570     50

df2 = pd.DataFrame({'PAIR1': ['140-120', '280-200', '350-310', '480-410', '500-570'],
                    'PAIR2': ['120-140', '200-280', '310-350', '410-480', '570-500'],
                    'BRAND' : ['A', 'V', 'P', 'V', 'P']})
print(df2)

     PAIR1    PAIR2 BRAND
0  140-120  120-140     A
1  280-200  200-280     V
2  350-310  310-350     P
3  480-410  410-480     V
4  500-570  570-500     P

If you take a closer look, you will notice that each value in the PAIR column of df1 match either the value in PAIR1 or PAIR2 of df2 .如果仔细观察,您会注意到df1PAIR列中的每个值PAIR2df2 PAIR1PAIR2中的值匹配。 In df2 , the keys are present in both ways (eg 140-120 and 120-140) .df2 ,密钥以两种方式存在(例如140-120120-140)

My goal is to merge the two DataFrames to obtain the following result:我的目标是合并两个 DataFrame 以获得以下结果:

      PAIR  SCORE BRAND
0  140-120     99     A
1  200-280     70     V
2  350-310     14     P
3  410-480     84     V
4  500-570     50     P

I tried to first merge df1 with df2 the following way:我尝试通过以下方式首先将df1df2合并:

df3 = pd.merge(left = df1, right = df2, how = 'left', left_on = 'PAIR', right_on = 'PAIR1')

Then, taking the resulting DataFrame df3 and merge it back with df2 :然后,获取生成的 DataFrame df3并将其与df2合并:

df4 = pd.merge(left = df3, right = df2, how = 'left', left_on = 'PAIR', right_on = 'PAIR2')

print(df4)

      PAIR  SCORE  PAIR1_x  PAIR2_x BRAND_x  PAIR1_y  PAIR2_y BRAND_y
0  140-120     99  140-120  120-140       A      NaN      NaN     NaN
1  200-280     70      NaN      NaN     NaN  280-200  200-280       V
2  350-310     14  350-310  310-350       P      NaN      NaN     NaN
3  410-480     84      NaN      NaN     NaN  480-410  410-480       V
4  500-570     50  500-570  570-500       P      NaN      NaN     NaN

This is not my desired result.这不是我想要的结果。 I don't how else I can account for the fact that the correct key might be either in PAIR1 or PAIR2 .我不知道我还能怎么解释正确的密钥可能在PAIR1PAIR2 Any help would be appreciated.任何帮助,将不胜感激。

Somewhat clumsy solution: build a Series that maps each pair in df2 to its corresponding brand, then pass this mapping to df1['PAIR'].map() .有点笨拙的解决方案:构建一个系列,将df2每一对映射到其对应的品牌,然后将此映射传递给df1['PAIR'].map()

# Build a series whose index maps pairs to values
mapper = df2.melt(id_vars='BRAND').set_index('value')['BRAND']
mapper
value
140-120    A
280-200    V
350-310    P
480-410    V
500-570    P
120-140    A
200-280    V
310-350    P
410-480    V
570-500    P
Name: BRAND, dtype: object

# Use the mapper on df1['PAIR']
df1['BRAND'] = df1['PAIR'].map(mapper)
df1
      PAIR  SCORE BRAND
0  140-120     99     A
1  200-280     70     V
2  350-310     14     P
3  410-480     84     V
4  500-570     50     P
temp_df1 = df2[['PAIR1', 'BRAND']]

temp_df2 = df2[['PAIR2', 'BRAND']]

temp_df2.rename(columns= {'PAIR2' : 'PAIR1'}, inplace= True)

big_df = pd.concat([temp_df1, temp_df2])

pd.merge(df1, big_df, how = 'left',  left_on = 'PAIR', right_on = 'PAIR1')

You are trying to suceesively, merge on column pairs PAIR and PAIR1 & PAIR and PAIR2 both times maintaining the argument how='left' which is creating all the NaN values.您正在尝试成功地合并列对PAIRPAIR1 & PAIRPAIR2同时维护创建所有NaN值的参数how='left'

Take a look atPandas Merging 101 .看看Pandas 合并 101

For your current implementation you need to take subset of the current result and remove the NaN 's.对于您当前的实现,您需要获取当前结果的子集并删除NaN

A much simpler solution would be to manipulate the PAIR in df1 so that it matches the pattern (large-small) or (small-large) in either of PAIR1 or PAIR2一个更简单的解决方案是操作df1PAIR ,使其匹配PAIR1PAIR2中的模式(大-小)或(小-大)

# for working with PAIR2
df1['FOR_MERGE'] = df1['PAIR'].map(lambda x: '-'.join([str(_) for _ in sorted(x.split('-'))])).values

df2['FOR_MERGE'] = df2['PAIR1'].map(lambda x: '-'.join([str(_) for _ in sorted(x.split('-'))])).values


pd.merge(df1[['FOR_MERGE', 'SCORE']], df2[['FOR_MERGE', 'BRAND']], how='left')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM