简体   繁体   中英

Check if date in one dataframe is between two dates in another dataframe, by group

I have the following problem. I've got a dataframe with start and end dates for each group. There might be more than one start and end date per group , like this:

group   start_date    end_date
 1      2020-01-03    2020-03-03
 1      2020-05-03    2020-06-03
 2      2020-02-03    2020-06-03

And another dataframe with one row per date, per group, like this:

group   date
 1     2020-01-03
 1     2020-02-03
 1     2020-03-03
 1     2020-04-03
 1     2020-05-03
 1     2020-06-03
 2     2020-02-03
 3     2020-03-03
 4     2020-04-03
      .
      .

So I want to create a column is_between in an efficient way, ideally avoiding loops, so I get the following dataframe

group   date          is_between
 1     2020-01-03        1
 1     2020-02-03        1
 1     2020-03-03        1
 1     2020-04-03        0
 1     2020-05-03        1
 1     2020-06-03        1
 2     2020-02-03        1
 3     2020-03-03        1
 4     2020-04-03        1
      .
      .

So it gets a 1 when a group's date is between the dates in the first dataframe. I'm guessing some combination of groupby , where , between and maybe map might do it, but I'm not finding the correct one. Any ideas?

Based on @YOBEN_S and @Quang Hoang's advice this made it:

df = df.merge(dic_dates, how='left')
df['is_between'] = np.where(df.date.between(pd.to_datetime(df.start_date),
                                                                   pd.to_datetime(df.end_Date)),1, 0)
df = (df.sort_values(by=['group', 'date', 'is_between'])
            .drop_duplicates(subset=['group', 'date'], keep='last'))

you could try with merge_asof , by the group and on the date and start_date, then check where the date is less than end_date and finally assign back to the original df2

ser = (pd.merge_asof(df2.reset_index() #for later index alignment
                        .sort_values('date'), 
                     df1.sort_values('start_date'), 
                     by='group', 
                     left_on='date', right_on='start_date', 
                     direction='backward')
         .assign(is_between=lambda x: x.date<=x.end_date)
         .set_index(['index'])['is_between']
        )

df2['is_between'] = ser.astype(int)

print (df2)
   group       date  is_between
0      1 2020-01-03           1
1      1 2020-02-03           1
2      1 2020-03-03           1
3      1 2020-04-03           0
4      1 2020-05-03           1
5      1 2020-06-03           1
6      2 2020-02-03           1
7      3 2020-03-03           0
8      4 2020-04-03           0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM