简体   繁体   中英

How to join DataFrame with multiple conditions on different columns?

I have two data-frames as follows: mydata1:

   ID   X1   X2  Date1
   002  324  634  2016-01-01
   002  334  534  2016-01-14 
   002  354  834  2016-01-30
   004  543  843  2017-02-01
   004  923  043  2017-04-15
   005  032  212  2015-09-01 
   005  523  843  2017-09-15
   005  212  222  2015-10-1

mydata2:

   ID   Y1     Y2    Date2
   002  1224   234  2016-01-04
   002  1254   249  2016-01-28
   004  321    212  2016-12-01 
   005  1121   222  2017-09-13 

I want to merge these two data-frames based on ID and the Date where the difference between Date1 --dataframe1-- and Date2 --indataframe2--is less than 15. So, my desired data-frame as an output should be like this:

    ID   X1   X2    Date1.      Y1.  Y2.      Date2
   002  324  634  2016-01-01.   nan.  nan.     nan
   002  334  534  2016-01-14    1224  234   2016-01-04
   002  354  834  2016-01-30.   1254   249  2016-01-28
   004  543  843  2017-02-01    321   212   2015-12-01 
   004  923  043  2017-04-15.    nan   nan.   nan
   005  032  212  2015-09-01    nan   nan.   nan
   005  523  843  2015-09-15.   1121  222   2017-09-13
   005  212  222  2015-10-1.    nan   nan.   nan

So your desired output is slightly wrong since one of the values is 2 years older than the joined value.

First we perform a join:

f = df.merge(df1, how='left', on='ID')

   ID   X1   X2       Date1    Y1   Y2       Date2
0   2  324  634  2016-01-01  1224  234  2016-01-04
1   2  334  534  2016-01-14  1224  234  2016-01-04
2   2  354  834  2016-01-30  1224  234  2016-01-04
3   4  543  843  2017-02-01   321  212  2016-12-01
4   4  923   43  2017-04-15   321  212  2016-12-01
5   5   32  212  2015-09-01  1121  222  2015-09-13
6   5  523  843  2015-09-15  1121  222  2015-09-13
7   5  212  222   2015-10-1  1121  222  2015-09-13

Then we create a boolean mask:

mask = (pd.to_datetime(f['Date1'], format='%Y-%m-%d') - pd.to_datetime(f['Date2'], format='%Y-%m-%d')).apply(lambda i: i.days <= 15 and i.days > 0)

0    False
1     True
2    False
3    False
4    False
5    False
6     True
7    False

Then we set it to nan where the condition does not match:

f.loc[~mask, ['Y1', 'Y2', 'Date2']] = np.nan

   ID   X1   X2       Date1      Y1     Y2       Date2
0   2  324  634  2016-01-01     NaN    NaN         NaN
1   2  334  534  2016-01-14  1224.0  234.0  2016-01-04
2   2  354  834  2016-01-30     NaN    NaN         NaN
3   4  543  843  2017-02-01     NaN    NaN         NaN
4   4  923   43  2017-04-15     NaN    NaN         NaN
5   5   32  212  2015-09-01     NaN    NaN         NaN
6   5  523  843  2015-09-15  1121.0  222.0  2015-09-13
7   5  212  222   2015-10-1     NaN    NaN         NaN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM