合并/加入有条件的熊猫数据框

Question

I have a two pandas DataFrame df1 and df2 .我有两个熊猫 DataFrame df1和df2 。 The relationship between them is one-to-many, and in some instances it can be one-to-one.它们之间的关系是一对多的，在某些情况下可以是一对一的。 When the relationship is one-to-many, I'd like to join columns with certain conditions.当关系是一对多时，我想加入具有某些条件的列。 I'll illustrate with some data.我会用一些数据来说明。

import pandas as pd

df1 = pd.DataFrame({
                    'vid': [1, 2, 3, 4, 5],
                    'lid': [6, 7, 8, 9, 10],
                    'v': [3, 5, 6, 1, 9]
                  })

df2 = pd.DataFrame({
                    'lid': [6, 6, 8, 8, 10],
                    'av': ['$10','$5','$4','$3','$2'],
                    'cr': [0.04, 0.05, 0.03, 0.04, 0.01]
                  })

For rows where there are multiple joins in df2 ie lid 6 and 8 , I'd like to apply some function say, get the max of av and cr .对于df2中有多个连接的行，即lid 6和8 ，我想应用一些函数，比如获取av和cr的max 。

Expected output:预期输出：

vid lid  v  av      cr
1    6   3  $10     0.05
2    7   5  np.nan  np.nan
3    8   6  $5      0.04
4    9   1  np.nan  np.nan
5    10  9  $2      0.01

Answer 1

For match by max or by min by both columns create helper column tmp and join new DataFrame created by sorting per columns lid and tmp with remove duplicates per lid :对于两列的最大匹配或最小匹配，创建帮助列tmp并加入通过对每个列lid和tmp进行排序创建的新 DataFrame ，并删除每个lid的重复项：

df2['tmp'] = list(zip(df2['av'].str.strip('$').astype(int), df2['cr']))

#sorting by ascending and desceding for match by maximal of tuple in col tmp
df = (df1.merge(df2.sort_values(['lid','tmp'], ascending=[True, False])
                   .drop_duplicates('lid'), how='left', on='lid')
                   .drop('tmp', axis=1))
print (df)
   vid  lid  v   av    cr
0    1    6  3  $10  0.04
1    2    7  5  NaN   NaN
2    3    8  6   $4  0.03
3    4    9  1  NaN   NaN
4    5   10  9   $2  0.01

df2['tmp'] = list(zip(df2['av'].str.strip('$').astype(int), df2['cr']))

#sorting both ascending for match by minimal of tuple in col tmp
df = (df1.merge(df2.sort_values(['lid','tmp'])
                   .drop_duplicates('lid'), how='left', on='lid')
                   .drop('tmp', axis=1))
print (df)
   vid  lid  v   av    cr
0    1    6  3   $5  0.05
1    2    7  5  NaN   NaN
2    3    8  6   $3  0.04
3    4    9  1  NaN   NaN
4    5   10  9   $2  0.01

EDIT: If aggregate max or mean aggregation working for each column separately, so ouput is different like solutions above:编辑：如果聚合max或mean聚合分别为每一列工作，那么输出与上面的解决方案不同：

df2['tmp'] = df2['av'].str.strip('$').astype(int)

df = df1.merge(df2.groupby('lid').max(), how='left', on='lid')
print (df)
   vid  lid  v   av    cr   tmp
0    1    6  3   $5  0.05  10.0
1    2    7  5  NaN   NaN   NaN
2    3    8  6   $4  0.04   4.0
3    4    9  1  NaN   NaN   NaN
4    5   10  9   $2  0.01   2.0

df = df1.merge(df2.groupby('lid').mean(), how='left', on='lid')
print (df)
   vid  lid  v     cr  tmp
0    1    6  3  0.045  7.5
1    2    7  5    NaN  NaN
2    3    8  6  0.035  3.5
3    4    9  1    NaN  NaN
4    5   10  9  0.010  2.0

合并/加入有条件的熊猫数据框

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-07-07 05:05:24

合并/加入有条件的熊猫数据框

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-07-07 05:05:24

解决方案1
1 已采纳 2022-07-07 05:05:24