[英]Merge Pandas DataFrame using apply() to only merge on partial match in two columns
I need to merge two pandas DataFrames but not only on exact column values, but also on approximate ones. 我需要合并两个pandas DataFrame,但不仅要合并确切的列值,还要合并近似的值。
For example, I have these two DataFrames: 例如,我有以下两个DataFrame:
import pandas as pd
d = {'col1': ["a", "b", "c", "d"], 'col2': [3, 4, 66, 120]}
df = pd.DataFrame(data=d)
col1 col2
0 a 3
1 b 4
2 c 66
3 d 120
d2 = {'col1a': ["aa", "bb", "cc", "dd"], 'col2b': [3, 4, 67, 100]}
df2 = pd.DataFrame(data=d2)
col1a col2b
0 aa 3
1 bb 4
2 cc 67
3 dd 100
Now, if I simply join them on col2
and col2b
columns, I will only get two rows where the column values are exactly the same. 现在,如果我只是将它们连接到
col2
和col2b
列上,那么我将仅获得两列值完全相同的行。
pd.merge(df, df2, how='inner', left_on='col2', right_on='col2b')
col1 col2 col1a col2b
0 a 3 aa 3
1 b 4 bb 4
Now, say for the simplicity of an example, I also want to merge column values based on the integer that is either +1 or -1 of the integer value from the left DataFrame. 现在,为简单起见,我还想基于来自左侧DataFrame的整数值的+1或-1的整数合并列值。 In our example in the left DataFrame the value
66
should be matched to 67
to the value from the right DataFrame in addition to the rows with values 3
and 4
: 在我们的示例中,除了具有值
3
和4
的行之外,左侧的DataFrame中的值66
还应与右侧的DataFrame中的值67
匹配:
col1 col2 col1a col2b
0 a 3 aa 3
1 b 4 bb 4
2 c 66 cc 67
Not sure how to approach this problem, somehow would need to merge based on the approximated column values using apply()
? 不确定如何解决此问题,是否需要使用
apply()
基于近似的列值进行合并?
Here is one way from merge_asof
这是来自
merge_asof
一种方法
pd.merge_asof(df,df2,left_on='col2',right_on='col2b',tolerance = 1,direction ='nearest').dropna()
Out[7]:
col1 col2 col1a col2b
0 a 3 aa 3.0
1 b 4 bb 4.0
2 c 66 cc 67.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.