简体   繁体   中英

Join DataFrame based on date which is between dates from another DataFrame

I have two Pandas Data Frames. First one looks like ( db1 ):

id   date_from   date_until   v1   v2
0    2015-06-17  2015-06-19    2    3
0    2015-06-17  2015-07-01    3    4 
1    2015-06-18  2015-06-20    5    6
1    2015-06-19  2015-07-12    5    4 

and the second one ( db2 ), like this:

id      date      v3   v4
0    2015-06-17   4    3
1    2015-06-17   5    6 
1    2015-06-18   6    4  

no i want to compress them so they based on id and date . Date compression should be if db2.date is older or equal than db1.date_from and use the youngest one from db2 witch is still older or equal than db1.data_from . So compressed data should look something like this:

id   date_from   date_until      date      v1   v2   v3   v4
0    2015-06-17  2015-06-19   2015-06-17   2    3    4    3
0    2015-06-17  2015-07-01   2015-06-17   3    4    4    3
1    2015-06-18  2015-06-20   2015-06-18   5    6    6    4
1    2015-06-19  2015-07-12   2015-06-18   5    4    6    4

I can't find any solution except iterating trough all rows, but this is not good enough because my data is bigger 100k of lines. Is there any tool to do this in pandas or should i use something else?

Thank you!

Let's use pd.merge_asof :

pd.merge_asof(df1,df2, left_on=['date_from'], right_on=['date'], by='id')

Output:

   id  date_from  date_until  v1  v2       date  v3  v4
0   0 2015-06-17  2015-06-19   2   3 2015-06-17   4   3
1   0 2015-06-17  2015-07-01   3   4 2015-06-17   4   3
2   1 2015-06-18  2015-06-20   5   6 2015-06-18   6   4
3   1 2015-06-19  2015-07-12   5   4 2015-06-18   6   4

Another way to do it:

import pandas as pd

import sys
if sys.version_info[0] < 3: 
    from StringIO import StringIO
else:
    from io import StringIO

a = StringIO('''id,date_from,date_until,v1,v2
0,2015-06-17,2015-06-19,2,3
0,2015-06-17,2015-07-01,3,4
1,2015-06-18,2015-06-20,5,6
1,2015-06-19,2015-07-12,5,4''')

b = StringIO('''id,date,v3,v4
0,2015-06-17,4,3
1,2015-06-17,5,6 
1,2015-06-18,6,4''')

df1 = pd.read_csv(a, sep=',')
df2 = pd.read_csv(b, sep=',')
df1.merge(df2, how='left', left_on=['id', 'date_from'], right_on=['id', 'date']).ffill()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM