简体   繁体   English

pandas 合并 asof 与多个匹配项

[英]pandas merge asof with more than one match

I would like to pandas merge_asof join the following dataframes我想 pandas merge_asof 加入以下数据帧

ll = pd.DataFrame([[pd.to_datetime('2010-01-01')], [pd.to_datetime('2010-02-01')]], columns = ['date_left'])
rr = pd.DataFrame([[pd.to_datetime('2010-01-01'), 12],
                   [pd.to_datetime('2010-01-01'), 6]], columns = ['date_right', 'variable'])

This is, ll:这是,ll:

    date_left
0   2010-01-01
1   2010-02-01

and rr:和 rr:

    date_right  variable
0   2010-01-01  12
1   2010-01-01  6

The following以下

pd.merge_asof(ll, rr, left_on = 'date_left', right_on='date_right', direction='backward')

gets me得到我

    date_left   date_right  variable
0   2010-01-01  2010-01-01  6
1   2010-02-01  2010-01-01  6

but I would like (and expect, as it is a left join)但我想(并且期望,因为它是左连接)

    date_left   date_right  variable
0   2010-01-01  2010-01-01  6
1   2010-01-01  2010-01-01  12
2   2010-02-01  2010-01-01  6
3   2010-02-01  2010-01-01  12

How can I achieve this result?我怎样才能达到这个结果?

---- EDIT ----: Sammywemmy gave the solution to use janitors conditional_join. ---- 编辑 ----: Sammywemmy 给出了使用管理员 conditional_join 的解决方案。 This works for the minimalistic example I posted above.这适用于我上面发布的简约示例。 However, I still want the rest of the merge_asof functionality.但是,我仍然想要 merge_asof 功能的 rest。 With this I mean the following:我的意思是:

ll = pd.DataFrame([[pd.to_datetime('2010-01-01')], [pd.to_datetime('2010-02-01')],[pd.to_datetime('2010-03-01')], [pd.to_datetime('2010-04-01')]], columns = ['date_left'])

ll = ll =

    date_left
0   2010-01-01
1   2010-02-01
2   2010-03-01
3   2010-04-01

and

rr = pd.DataFrame([[pd.to_datetime('2010-01-01'), 12],
                   [pd.to_datetime('2010-01-01'), 6],
                   [pd.to_datetime('2010-03-01'), 3]], columns = ['date_right', 'variable'])

rr = rr =

date_right  variable
0   2010-01-01  12
1   2010-01-01  6
2   2010-03-01  3

Then I would like:然后我想:

    date_left   date_right  variable
0   2010-01-01  2010-01-01  6
1   2010-01-01  2010-01-01  12
2   2010-02-01  2010-01-01  6
3   2010-02-01  2010-01-01  12
4   2010-03-01  2010-03-01  3
5   2010-04-01  2010-03-01  3

Whereas the conditional join would give me:而有条件的加入会给我:

    date_left   date_right  variable
0   2010-01-01  2010-01-01  12
1   2010-01-01  2010-01-01  6
2   2010-02-01  2010-01-01  12
3   2010-02-01  2010-01-01  6
4   2010-03-01  2010-01-01  12
5   2010-03-01  2010-01-01  6
6   2010-03-01  2010-03-01  3
7   2010-04-01  2010-01-01  12
8   2010-04-01  2010-01-01  6
9   2010-04-01  2010-03-01  3

thanks谢谢

One option is with the conditional_join from pyjanitor :一种选择是使用pyjanitorconditional_join

# pip install pyjanitor
import pandas as pd
import janitor
ll.conditional_join(rr, 
                    # column from left, column from right, operator
                   ('date_left', 'date_right', '>='), 
                    how = 'left')
 
   date_left date_right  variable
0 2010-01-01 2010-01-01        12
1 2010-01-01 2010-01-01         6
2 2010-02-01 2010-01-01        12
3 2010-02-01 2010-01-01         6

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM