pandas 非相等加入范围

Question

I need to do 'pandas non equi join', when first table joined with second table in range.当第一个表与范围内的第二个表连接时，我需要执行“pandas non equi join”。

first_table first_table

EMPLOYEE_ID SALARY
100     3000.00
101     17000.00
102     17000.00
103     9000.00
104     6000.00
105     4800.00
106     4800.00
…………..  …………
………………. …………

second_table
grade_id    lowest_sal  highest_sal grade_level
1   0       3500    GRADE-A
2   3501    7000    GRADE-B
3   7001    10000   GRADE-C
4   10000   20000   GRADE-D

Need_table(OUTPUT):
EMPLOYEE_ID SALARY  grade_level
115        3000         GRADE-A
116        17000        GRADE-D
117        17000        GRADE-D
118        9000         GRADE-C
119        6000         GRADE-B
125        4800         GRADE-B
126        4800         GRADE-B

This equivalent SQL query as:这相当于 SQL 查询为：

SELECT   f.EMPLOYEE_ID,
         f.SALARY,
         s.grade_level
FROM first_table f JOIN second_table s
ON f.SALARY BETWEEN s.lowest_sal AND s.highest_sal

Can't to use 'pd.merge' method to join tables because not have any common column.... Please help to find method无法使用“pd.merge”方法连接表，因为没有任何公共列....请帮助查找方法

Thanks谢谢

Answer 1

If df1 is your first table and df2 is your second table, you could do for example this:如果df1是你的第一个表而df2是你的第二个表，你可以这样做：

d = df2.set_index('grade_level').to_dict('split')

df1['GRADE'] = df1['SALARY'].apply(
        lambda x: next((c for i, c in enumerate(d['index']) if d['data'][i][1] <= x <= d['data'][i][2]), np.nan)
    )

print(df1)

Prints:印刷：

   EMPLOYEE_ID   SALARY    GRADE
0          100   3000.0  GRADE-A
1          101  17000.0  GRADE-D
2          102  17000.0  GRADE-D
3          103   9000.0  GRADE-C
4          104   6000.0  GRADE-B
5          105   4800.0  GRADE-B
6          106   4800.0  GRADE-B

Answer 2

One option is with conditional_join from pyjanitor , which avoids a cartesian join (helpful with memory, and performance, depending on the data size):一种选择是使用pyjanitor中的conditional_join ，它避免了笛卡尔连接（有助于 memory 和性能，具体取决于数据大小）：

# pip install pyjanitor
import pandas as pd
import janitor

(first_table
.astype({'SALARY':int})
.conditional_join(
    second_table, 
    ('SALARY', 'lowest_sal', '>='), 
    ('SALARY', 'highest_sal', '<='))
.loc[:, ['EMPLOYEE_ID', 'SALARY', 'grade_level']]
)
   EMPLOYEE_ID  SALARY grade_level
0          100    3000     GRADE-A
1          101   17000     GRADE-D
2          102   17000     GRADE-D
3          103    9000     GRADE-C
4          104    6000     GRADE-B
5          105    4800     GRADE-B
6          106    4800     GRADE-B

pandas 非相等加入范围

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-01-18 00:14:45

解决方案2
0 2022-04-23 10:42:57

pandas 非相等加入范围

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-01-18 00:14:45

解决方案2 0 2022-04-23 10:42:57

解决方案1
1 已采纳 2020-01-18 00:14:45

解决方案2
0 2022-04-23 10:42:57